Functions

advertisement
Component Name:
Author:
Bag of words
M. Makridis (makridis@iti.gr)
Language:
C/C++
Prerequisites:
- OpenCV 2.1 or higher (latest 2.3.1) - http://opencv.willowgarage.com/wiki/
Description: This component contains a bag of words technique. For each local feature, a
histogram is extracted. Reddi multi-thresholding technique is used in order to cluster local
feature values into words. Words are defined by Reddi extracted thresholds.
Ref: Automatic Classification of Archaeological Pottery Sherds_submitted.doc
THE REDDI COMPONENT IS PREREQUISITE FOR BAG OF WORDS COMPONENT
Functions
ReddiGT: The main function for Reddi multi-thresholding
application (It can be found in Reddi component)
ExtractClassifyArr: This function performs clustering
based on extracted Reddi thresholds.
Input parameters:
int NumofTh: num thresholds
vector<vector<int>> &ClassifyArrFront: This vector holds word ranges
on the histogram of all features (second dimension). For example if
there is a threshold T between histogram bins A and B,
ClassifyArrFront, holds the ranges [A,T] and [T,B] in order to
cluster feature values. (It is used for front views images of sherds)
vector<vector<int>> &ClassifyArrBack: The same as ClassifyArrFront
for back views of sherds.
vector<vector<int>>ThressarrayFront: Vector that holds thresholds for
all features. (It is used for front views images of sherds)
vector<vector<int>> ThressarrayBack: The same as ThressarrayFront for
back views of sherds.
int Features: The number of features
Output parameters:
vector<vector<int>> &ClassifyArrFront: Look at input parameters
vector<vector<int>> &ClassifyArrBack: Look at input parameters
ExtractGTDescriptors: This function calculates pixel
membership to each class based on reddi ranges.
Input parameters:
int NumofTh: Number of thresholds
string cStr: The path and the filename of the corresponding front
view image.
string cStr2: The path and the filename of the corresponding back
view image.
vector<vector<vector<int>>> FeatGTFront: A 3D vector with size Height
X Width X NumberOfFeatures which depends on the size of the
corresponding image and the total number of features. It is used to
store local feature values in a pixel basis and to help clustering in
this function.
vector<vector<vector<int>>> FeatGTBack: The same as FeatGTFront for
back views
vector<vector<double>> &GTDescriptors: A 2D vector that holds feature
vectors extracted from BoW method and concatenates alltogether.
vector<vector<int>> ClassifyArrFront: This vector holds word ranges
on the histogram of all features (second dimension). For example if
there is a threshold T between histogram bins A and B,
ClassifyArrFront, holds the ranges [A,T] and [T,B] in order to
cluster feature values. (It is used for front views images of sherds)
vector<vector<int>> ClassifyArrBack: The same as ClassifyArrFront for
back views of sherds.
int pixelsGT: The total number of pixels from all ground truth image
sherds.
int Features: The number of features.
Output parameters:
vector<vector<double>> &GTDescriptors: A 2D vector that holds feature
vectors extracted from BoW method and concatenates alltogether.
Theory
Initially, bag of words (BoW) models were applied on words in documents and were
related with the frequency of the appearance of each word without preserving the order of
appearance in a sentence. Similarly to these models, new bag of words models were soon
applied to image features.
Here, a new technique for creating bag of words is proposed using the Reddi multithresholding method. The latter is based on the maximization of the intra-class variances
between different classes, which can be seen as valley detection on a histogram. Since
maximization is achieved, clusters’ range can be easily calculated as the range between
neighbor thresholds.
More specifically, let us assume an image I with dimensions K,L. The proposed BoW
model can be described by the following steps:
Firstly, histogram extraction for each local feature ๐‘“๐‘  takes place according to the following
equation:
๐ฟ
โ„Ž๐‘“ (๐‘ฅ) = ∑๐พ
๐‘–=0 ∑๐‘—=0 ๐‘ก(๐‘“๐‘  (๐‘–, ๐‘—)) , ๐‘ฅ = 0,1,2, … ,255
(1)
where ๐‘“(๐‘–, ๐‘—) is the feature value in coordinates (๐‘–, ๐‘—) and
๐‘ก(๐‘“๐‘  (๐‘–, ๐‘—)) = {
1, ๐‘–๐‘“ ๐‘“๐‘  (๐‘–, ๐‘—) = ๐‘ฅ
0,
(2)
๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’
Then, the accumulative histogram ๐ด๐ป๐‘“๐‘  is created for each feature and for all ground truth
sherds according to the following equation:
๐ด๐ป๐‘“๐‘  = ∑๐‘–=๐‘
๐‘–=0 โ„Ž๐‘–,๐‘“๐‘  (๐‘ฅ)
where N is the total number of ground truth sherd images, which is equal to the total
number of sherd classes (in case of one ground truth per class).
Finally, Reddi multi-threshording is applied on each feature’s accumulative histogram AHf .
“Words” are created according to features’ values and the extracted thresholds, as described
above.
Using this transformation, the dimensionality of the final global feature vector is reliably
decreased from 256 (all histogram bins) to the final number of thresholds. Since we have ๐‘
ground truth sherd images we define ๐‘ − 1 thresholds leading to ๐‘ “words” in each feature’s
histogram.
After BoW realization, all local features are concatenated forming a global descriptor
(3)
vector that describes the whole sherd image. A graphical presentation of the proposed BoW
technique is depicted in Error! Reference source not found..
Graphical presentation of the proposed bag of words procedure
Download