Word & Pic HW3 – Report (Naive Bayes)

advertisement
Word & Pic HW3 – Report (Naive Bayes)
Students: Yulong Wang (#108319008), Bo Feng (#108809282)
In this homework, we train a Naive Bayes classifier on bag of words and bag of (SIFT) image keypoints features.
Data Preparation
As we have four bags related categories (hobo, shoulder, clutch, totes). The first step we do is to
split our data into two parts for every category. For each bag category, those images numbered
500 and below are in training part. The rest images, those images numbered above 500 are in our
testing part.
Part 1 - Image Representation
The idea of this part is similar to the idea of bag of words feature, which is called “bag of keypoints” feature in the paper we read. So we can model images as a histogram of visual word
counts. And our visual word can be found by clustering SIFT features.
For each image, we first extract its SIFT descriptor as our image features by using the vl_sift()
in vlfeat package.
We then use some large random subset of SIFT features of our training data, cluster the features
into a visual word code book using kmeans by vl_kmeans(). We tried different numbers of subset,
which are 10000, 20000, 40000, 80000, 160000 (around 230000 in total). We found that the
result was not impacted too much from these numbers. The reason for this is because, by calling
vl_sift(), we get to much sift interesting points, so by randomly uniformly choosing a relatively
large subset of them is sufficient for clustering.
For how we choose subset of these features, for example, if we are going to choose 80000 points,
and we have four categories, we evenly choose from each category. That is, we choose 20000
points randomly in each category.
For choosing the number of clustering center, we analyzed the error from different cluster
number, ranging from 100 to 3000 with interval 100.
For each image, and each SIFT feature we chose in that image, label the feature as the closest
visual word. For each image, compute your image representation as a histogram of visual word
counts.
Part 2 - Text Representation
We modeled text descriptions as a histogram of word counts just like what we did in previous
homework. We compute our lexicon as all words in the shopping descriptions that appear
relatively frequently. As usual, we use regular expression to get lower tokens of words.
Thus we can, for each text document, compute the text representation as a histogram of word
counts.
Part 3 - Training Classifiers
In this part, we trained our Naive Bayes classifier by first calculating P (Fi|Cj) for each feature i,
and each category j. As P(Cj) is uniform over the 4 categories, we can simply ignore this part in
our calculation.
Each visual and textual word will be a feature in our Naive Bayes classifier. We use the training
data for each bag category to calculate the probability of features given categories using counts
with Laplace smoothing, which is the simple add-1 smoothing. (来张公式)
Part 4 - Image Classification and Confusion
Here we will classify test images using our trained classifier.
For each test image calculate the probability of each category using your classifier (see Csurka
paper, eqn 1 for this calculation). Label the image as the highest probability category.
Compute a confusion matrix showing for each category what percentage of images from that
category were confused with the other categories (see the Csurka paper for an example). The
diagonal of this matrix is your per category accuracy.
Download