Component Name: Author: Bag of words M. Makridis (makridis@iti.gr) Language: C/C++ Prerequisites: - OpenCV 2.1 or higher (latest 2.3.1) - http://opencv.willowgarage.com/wiki/ Description: This component contains a bag of words technique. For each local feature, a histogram is extracted. Reddi multi-thresholding technique is used in order to cluster local feature values into words. Words are defined by Reddi extracted thresholds. Ref: Automatic Classification of Archaeological Pottery Sherds_submitted.doc THE REDDI COMPONENT IS PREREQUISITE FOR BAG OF WORDS COMPONENT Functions ReddiGT: The main function for Reddi multi-thresholding application (It can be found in Reddi component) ExtractClassifyArr: This function performs clustering based on extracted Reddi thresholds. Input parameters: int NumofTh: num thresholds vector<vector<int>> &ClassifyArrFront: This vector holds word ranges on the histogram of all features (second dimension). For example if there is a threshold T between histogram bins A and B, ClassifyArrFront, holds the ranges [A,T] and [T,B] in order to cluster feature values. (It is used for front views images of sherds) vector<vector<int>> &ClassifyArrBack: The same as ClassifyArrFront for back views of sherds. vector<vector<int>>ThressarrayFront: Vector that holds thresholds for all features. (It is used for front views images of sherds) vector<vector<int>> ThressarrayBack: The same as ThressarrayFront for back views of sherds. int Features: The number of features Output parameters: vector<vector<int>> &ClassifyArrFront: Look at input parameters vector<vector<int>> &ClassifyArrBack: Look at input parameters ExtractGTDescriptors: This function calculates pixel membership to each class based on reddi ranges. Input parameters: int NumofTh: Number of thresholds string cStr: The path and the filename of the corresponding front view image. string cStr2: The path and the filename of the corresponding back view image. vector<vector<vector<int>>> FeatGTFront: A 3D vector with size Height X Width X NumberOfFeatures which depends on the size of the corresponding image and the total number of features. It is used to store local feature values in a pixel basis and to help clustering in this function. vector<vector<vector<int>>> FeatGTBack: The same as FeatGTFront for back views vector<vector<double>> &GTDescriptors: A 2D vector that holds feature vectors extracted from BoW method and concatenates alltogether. vector<vector<int>> ClassifyArrFront: This vector holds word ranges on the histogram of all features (second dimension). For example if there is a threshold T between histogram bins A and B, ClassifyArrFront, holds the ranges [A,T] and [T,B] in order to cluster feature values. (It is used for front views images of sherds) vector<vector<int>> ClassifyArrBack: The same as ClassifyArrFront for back views of sherds. int pixelsGT: The total number of pixels from all ground truth image sherds. int Features: The number of features. Output parameters: vector<vector<double>> &GTDescriptors: A 2D vector that holds feature vectors extracted from BoW method and concatenates alltogether. Theory Initially, bag of words (BoW) models were applied on words in documents and were related with the frequency of the appearance of each word without preserving the order of appearance in a sentence. Similarly to these models, new bag of words models were soon applied to image features. Here, a new technique for creating bag of words is proposed using the Reddi multithresholding method. The latter is based on the maximization of the intra-class variances between different classes, which can be seen as valley detection on a histogram. Since maximization is achieved, clusters’ range can be easily calculated as the range between neighbor thresholds. More specifically, let us assume an image I with dimensions K,L. The proposed BoW model can be described by the following steps: Firstly, histogram extraction for each local feature ๐๐ takes place according to the following equation: ๐ฟ โ๐ (๐ฅ) = ∑๐พ ๐=0 ∑๐=0 ๐ก(๐๐ (๐, ๐)) , ๐ฅ = 0,1,2, … ,255 (1) where ๐(๐, ๐) is the feature value in coordinates (๐, ๐) and ๐ก(๐๐ (๐, ๐)) = { 1, ๐๐ ๐๐ (๐, ๐) = ๐ฅ 0, (2) ๐๐กโ๐๐๐ค๐๐ ๐ Then, the accumulative histogram ๐ด๐ป๐๐ is created for each feature and for all ground truth sherds according to the following equation: ๐ด๐ป๐๐ = ∑๐=๐ ๐=0 โ๐,๐๐ (๐ฅ) where N is the total number of ground truth sherd images, which is equal to the total number of sherd classes (in case of one ground truth per class). Finally, Reddi multi-threshording is applied on each feature’s accumulative histogram AHf . “Words” are created according to features’ values and the extracted thresholds, as described above. Using this transformation, the dimensionality of the final global feature vector is reliably decreased from 256 (all histogram bins) to the final number of thresholds. Since we have ๐ ground truth sherd images we define ๐ − 1 thresholds leading to ๐ “words” in each feature’s histogram. After BoW realization, all local features are concatenated forming a global descriptor (3) vector that describes the whole sherd image. A graphical presentation of the proposed BoW technique is depicted in Error! Reference source not found.. Graphical presentation of the proposed bag of words procedure