Image Miner: An Architecture to Support Deep Mining of Images

Image Miner: An Architecture to Support Deep Mining
of Images
by
Edwin Meng Zhang
Submitted to the Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Masters of Science in Computer Science and Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
c Massachusetts Institute of Technology 2015. All rights reserved.
○
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 22, 2015
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Kalyan Veeramachaneni
Research Scientist
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Professor Albert Meyer
Chairman, Masters of Engineering Thesis Committee
2
Image Miner: An Architecture to Support Deep Mining of Images
by
Edwin Meng Zhang
Submitted to the Department of Electrical Engineering and Computer Science
on May 22, 2015, in partial fulfillment of the
requirements for the degree of
Masters of Science in Computer Science and Engineering
Abstract
In this thesis, I designed a cloud based system, called ImageMiner, to tune parameters of
feature extraction process in a machine learning pipeline for images. Feature extraction is
a key component of the machine learning pipeline, and tune its parameters to extract the
best features can have significant effect on the accuracy achieved by the machine learning
system. To enable scalable parameter tuning, I designed a master-slave architecture to run
on the Amazon cloud. To overcome the computational bottlenecks due to large datasets, I
used a data parallel approach where each worker runs independently on a subset of data.
The worker uses a Gaussian Copula Process to tune parameters and determines the best set
of parameters and model to use.
Thesis Supervisor: Kalyan Veeramachaneni
Title: Research Scientist
3
4
Acknowledgments
I would like to thank Kalyan Veeramachaneni for supporting and guiding me through my
thesis project. His guidance was critical to the completion of this project. I would also like to
thank everyone in the ALFA lab for helping me and providing a good working environment.
I would like to thank my girlfriend, Riana Lo Bu, for her support and keeping me grounded
through all the highs and lows during this process.
I would also like to thank all my friends who helped me through MIT and who made the
experience fun.
Finally, I would also like to thank my family, and especially my parents, for getting me
to where I am today and helping me through every step of the process.
5
6
Contents
1 Introduction
1.1
11
What is ImageMiner? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 The Geolocation Problem
13
15
2.1
The Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.2
Multiple Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.1
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.2
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2.3
Hierarchical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3
Past Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.4
Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
3 ImageMiner
23
3.1
Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
3.2
Possible designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
3.2.1
Data Parallel Design . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2.2
Parallel Iteration Design . . . . . . . . . . . . . . . . . . . . . . . . .
25
ImageMiner Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.3.1
Master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.3.2
Slave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.3.3
Other Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
Training Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.4.1
31
3.3
3.4
User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.5
3.4.2
Master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.4.3
Slave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
Testing Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.5.1
User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.5.2
Master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
3.5.3
Slave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.6
Parameter Tuning Algorithm
. . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.7
Summary of ImageMiner . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4 Experiments and Results
41
5 Experience of designing the system
47
5.1
Data pre-processing and preparation . . . . . . . . . . . . . . . . . . . . . .
47
5.2
ImageMiner Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.3
System issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.3.1
Installing libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.3.2
Instance limits
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.4
Integrating Python Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.5
Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.6
SVM Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.7
Processing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.8
Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.9
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
6 Conclusion
6.1
6.2
55
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.1.1
User Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
6.1.2
Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
Future Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
A ImageMiner Interface
57
A.1 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
57
A.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
B Results
59
61
9
10
Chapter 1
Introduction
With the adoption of smart phones, the number of pictures available on the web have increased tremendously. For example on Flickr, in 2005, less than 10 million images were being
uploaded per month. In 2014, that number was over 60 million images per month [12]. As
the number of pictures on the internet is growing, there is an ever increasing amount of interest in performing machine learning on these images. Researchers want to be able to extract
information from these pictures to do something useful with them. Common examples are
performing an image search for similar pictures, identifying where an image was taken, or
doing object recognition or scene understanding. Algorithms to do these activities fall under
the area of machine learning called computer vision.
Most approaches in computer vision follow the same basic steps.
Pre-processing step After acquiring the images, pre-process the image to get the image
in the desired format and extract useful metadata from the image.
Tag and label retrieval Tags, labels, and latitude and longitude information are extracted
from the images for future use, when available.
Feature Extraction Extract visual features from the images via one of the many feature
extraction libraries available, such as SIFT or GIST.
Training step Use the extracted features to train a classification or regression model on
the training data to predict the label of the data.
11
Testing Test the model produced from the training step on a set of test images.
An example of a computer vision problem is trying to predict the tags for an image, given
the image, relevant metadata and any related images.
Problems with the current workflow for machine learning on images
Each step is done separately This is inefficient because that means researchers have to
manually build and run each step. They need to store the results from one step so
that the next step can use it, which takes space and time. By combining all the steps
in the computer vision pipeline into an end-to-end framework, we can save researchers
time and space by allowing researchers to provide inputs into one system and receive
their desired outputs without having to handle all the data in between.
Large Datasets Image datasets are becoming larger and larger as more images become
available online. Yahoo!, for example, has provided researchers access to a dataset
of 99.3 million images and 0.7 million videos, which takes about 12GB just for the
metadata [15]. If we were to factor in the size of the dataset with the actual images
and videos were included, the size would be even larger. We want to build a parallel
computing architecture to process and perform machine learning on larger datasets.
Storing features takes a huge amount of space. For the dataset Yahoo! provided, they
planned to pre-compute and store a number of computer vision and audio features for
each image and video. They estimated that storing all these features would take 50
TB [15]! Instead, if we are able to design a system that can extract features on demand and enable analyses, we would no longer need to store features in a database
and perform queries and database lookups, which can be quite expensive. This would
greatly decrease the storage overhead required for many problems in computer vision.
Parameter Tuning There many different ways to process the data, many different machine
learning algorithms, and many parameters that can be tuned when extracting features.
Not extracting the best possible features can potentially limit the how well the machine
learning algorithm performs. Current research is mainly focused on tuning the hyper
12
parameters/parameters of the classifier and feature selection instead of focusing on
optimizing the whole pipeline.
All other methods we looked at, first extract features and then focus on figuring out
the best regression or classification model to build. We want to optimize not only the
prediction model, but also the feature extraction and image pre-processing steps. Many
computer vision feature extractors come with a set of parameters that the user can tune
while extracting these features. For example, with the SIFT feature extractor, one can
tune parameters like the edge threshold, the number of octaves, the peak threshold.
1.1
What is ImageMiner?
The goal of ImageMiner is to build an architecture to support deep mining of images.
Definition 1. Deep mining is defined as a parameter tuning system that attempts to
tune the entire machine learning pipeline including the parameters involved in feature
extraction.
As we mentioned above, typically, only the parameters for the classifier are tuned. By
tuning the parameters during the feature extraction process, we can extract better features,
which in turn can lead to better classifiers and better results. ImageMiner is an attempt
to built the first system of its kind to tune parameters and extract features on the cloud
without storing any features, to the best of our knowledge.
However, feature extraction is an expensive process. To properly tune feature extraction
parameters, it requires multiple iterations with different sets of parameter values, which
means that we are performing feature extraction several times over the same data to tune
one set of parameters. This is a major challenge that ImageMiner aims to address.
To implement all this, we built a architecture in Java and Python that runs on Amazon
Web Services (AWS). This is done using a Master-Slave framework. We parallelized the
process by creating multiple Amazon EC2 instances to run each slave on so that multiple
slaves may be running ImageMiner at one time.
13
ImageMiner is split into two modules: the Training module, which tunes parameters and
produces a model, and the Testing module, which tests the models produced by the Training
module.
We downloaded the Flickr Creative Commons dataset provided by Yahoo! to get a
database of images to train and test ImageMiner on [15].
The rest of the thesis is laid out as follows. Chapter 2 discusses the geolocation problem.
Chapter 3 describes the ImageMiner architecture. Chapter 4 goes over the experiments we
performed on ImageMiner and the results from those experiments. Chapter 5 discusses the
challenges that were faced while building ImageMiner. Chapter 6 talks about future work
left for ImageMiner.
14
Chapter 2
The Geolocation Problem
Geolocation is the problem of identifying the location of something, in our case, an image,
based off the information provided by the image. Geolocation is an important problem
because people like to see a visual picture of whatever they are searching for or studying.
When people see a picture of a location, they immediately want to know when, and more
importantly, where the picture was taken. The location of a picture can provide a lot of
context for an image and change the meaning of the picture.
Geolocation can also affect our personal lives, as well. If a friend on Facebook posts a
picture, we want to know where that picture was taken. If we see a picture of somewhere
beautiful online, we want to know where that picture was taken so that we may perhaps
visit that place.
2.1
The Data Set
The dataset that we will be using for this geolocation problem comes from the Flickr Creative
Commons dataset provided by Yahoo for research purposes [15]. This dataset consists of 99.3
million images, 49 million of which are geo-tagged, and 0.7 million videos. The metadata
for each image or video contains the title of the image or video, the user id of the uploader,
the URL for the image, tags for the image, latitude and longitude for geo-tagged images,
and several other useful facts about the image. We had to submit a request to the Yahoo!
Webscope program for approval to obtain use of the dataset, which is hosted on Amazon S3.
15
Table 2.1: Data for One Image Metadata File
Number of total images
9975030
Number of images with geolocation
1541252
Number of images with tags
6014968
Number of images with user tags
5969236
For our project, we narrowed down the dataset from 99.3 million images down to about
1.5 million images. The metadata we received from Yahoo! was simply a list of 10 text files
with one line of metadata for each image. To narrow down the dataset to a manageable
amount, we selected one of the 10 files and took all the images with a geolocation, which
was a little more than a tenth of the images, to give us our 1.54 million image dataset, as
shown in Table 2.1.
Figure 2-1 shows a heat map of the location of the 1.54 million images in our dataset.
As expected, most the images are located on the coasts of the continental United States, as
well as central and western Europe, while almost no images are present closer to the poles.
Figure 2-1: A heat map showing a distribution of all the images in our dataset. The heat
map was generated using heatmap.py [5] and OSM Viz [13]
16
2.2
Multiple Methodologies
When doing any type of machine learning, there are many different ways to define the
problem. The three main methods for this problem are classification, regression, and a
hierarchical approach.
2.2.1
Classification
The first step in doing a classification problem is to assign a label to each image. There are
two common ways to do this. The first is by clustering the images.
Clustering images can be done a number of ways, such as k-means. Once the images are
clustered, each image is given a label corresponding to what cluster they are in. Features are
then extracted from each image and a classifier is trained using the extracted features and the
image labels. Future images are classified by extracting its features and running it through
the trained classifier. A variation of this method is to set a threshold after the images are
clustered. Only images or clusters that pass the threshold are used for the classifier.
The second method is to assign each image to a city, depending on what cities the image
is close to. For example, one image can be assigned to Paris, while another is assigned to
New York. When trying to classify an image, the classifier will produce a list of possible
cities and the image will be assigned to the most likely city.
To determine the performance of the classifier, we count the percentage of images that
have been assigned the correct label.
2.2.2
Regression
Regression for geo-tagging problems is done by extracting features from an image and after
performing some computation on those features, we build a model between the image features and the latitude and longitude of an image. To perform regression, we first extract
features from a number of images. Then, given the image’s latitude and longitude, we train
a regression model to link an image’s features to it’s geo-location. Future images are geotagged by extracting its features and running it through the model to give it a latitude and
17
longitude. To evaluate the results, the estimated latitude and longitude are compared to the
actual latitude and longitude to see how far off the guess actually was.
2.2.3
Hierarchical
A hierarchical approach is done by dividing the world map into sections and iteratively dividing those sections into smaller sections. After extracting features, the images are iteratively
assigned to a section and within that section assigned to another section until they have
been assigned to a section on the lowest level. An estimated location is considered correct if
it is assigned to the same section as the actual location.
2.3
Past Approaches
Several efforts have already been made to geo-tag images and videos based on the features
and metadata of the images. Many of these have come as result of the MedialEval Placing
Task, which challenges researchers to be able to accurately locate where an image or video
was taken based off the features of the image and the metadata for that image [1]. There
were several different approaches to trying to accurately place these multimedia items.
One group extracted a variety of different features, such as Gist and color histograms,
and then did a k-Nearest Neighbors to form a distribution of likely locations across the
world, also known as a probability map. The location of the highest probability was the
likely location of the image [7]. However, this approach had less than a 20% accuracy within
200 km.
Another group had a different approach to feature extraction. Their approach centered
on extracting features, such as SIFT and color histograms, from an image to create a feature
vector and storing that feature vector in a dictionary of scenes. When trying to geo-tag an
image, they extract the feature vectors from the image and compare it to those already in
the dictionary to determine the most likely location [14]. This approach, although slightly
better than [7], still only had roughly a 25% success rate within 200km.
On the other end of the spectrum, another group only used feature extraction as a last
resort [16]. This group would first search for any tags for the image and use frequency
18
Table 2.2: Information about previous approaches to geotagging
Study
Features Used
Methodology
Algorithm
[9]
Tags, Color histogram, Texton
histogram
Hierarchical
Border detection, Iteration
[4]
Tags, FCTH, CEDD, Tamura,
Gist
Regression
Graphical Models, Conditional
dependency, Gausian Mixture
Model
[10]
Tags, User Profile, Color
histogram, FCTH, CEDD,
Tamura, Gabor, Edge
histogram
Classification
Prior Distribution
[16]
Tags, User Profile, SIFT
Hierarchical
IR Frequency, Frequency
matching, Filtering, Prior
Distribution
[7]
Color histogram, Texton
histogram, Tiny images, Line
features, Gist, Geometric
context
Classification
k-Nearest Neighbor,
Probability Map
[3]
Tags, Color histogram, Tiny
images, Gist
Regression
Canonical Correlation
Analysis, Logistic Canonical
Correlation Regression, Mean
shift algorithm
[14]
Color histogram, FCTH,
CEDD, Tamura, Gabor, Edge
histogram, SIFT
Classification
Bag-of-scenes, Visual
dictionary
matching to place the most likely location of the image, given the set of tags. If no tags
were available, then the user profile of the person who uploaded the image was used. They
would use information, such as the user upload history, the user’s hometown, or the user’s
social network information to guess the location of the image. If nothing useful could be
extracted from the user’s profile, the group would extract features from the image, using
SIFT. A nearest neighbor search would then be performed to determine the location of the
image. Although they performed better than [7], they only had roughly a 50% accuracy
within 200 km. Although this seems significantly better than the previous two groups, most
of this boost is from the use of tags and the challenge of accurately predicting the location
of images through images features still remains.
19
Several other groups combined feature extraction with metadata from the images, such
as user information and tags, to try to determine the location of where the images were
taken. [9, 4, 10]. Out of these, [9] performed the best with over a 98% accuracy within 200
km. They divided the world map into national borders to try and narrow down possible
areas that the image could have been taken before combining their feature extraction and
image metadata with a probabilistic model before using a centroid-based candidate fusion
to finally estimate where the image was taken [9].
One thing in common that all the groups had was that when they attempted to estimate
the location where the multimedia item was taken, their algorithm would return a latitude
and longitude. To test the performance and accuracy of their algorithm, they would measure
how far their predicted latitude and longitude was from the actual latitude and longitude.
Most groups would then determine the accuracy of their algorithm based on how far their
predictions were from the actual locations within a variety of distances, such as 100km and
200km.
2.4
Our Approach
Our approach to the problem was to use the classification method. To cluster the images,
we implemented a very simple clustering algorithm that would initially put all the images in
a single cluster and then iteratively divided each cluster. Only clusters greater than 200km
were divided, while those smaller than 200km were left as is. We defined the distance of a
cluster to be the largest distance between two images in the cluster. To divide the clusters,
we picked two images that were at least 500km apart, or at least 200km if no such images
could be found. These images became the first image in our two newer and smaller clusters.
We then went through every image in that cluster and assigned it to one of the newer
clusters, depending on which cluster that image was closer too. We repeat this process until
all clusters are smaller than 200km.
We ended up with 29789 clusters for our 1.5 million images. The distribution of the
images in the clusters is shown in 2-2. A cutoff was then set to only select clusters with
100 or more in them, to reduce the dataset, and to eliminate smaller clusters that did not
20
Number of Images
6,000
4,000
2,000
3,
00
0
2,
50
0
2,
00
0
1,
50
0
00
0
1,
50
0
1
0
Cluster
Figure 2-2: Distribution of images for clusters with greater than 100 images in them.
have enough images to be useful for training our classifier. This gave us our final dataset of
825659 images with 2766 clusters. We split two-thirds of the images to use as training data,
while designating the remaining one-third as test data.
21
22
Chapter 3
ImageMiner
ImageMiner is an end-to-end image mining system designed to automatically tune feature
extraction parameters. It is split into two modules: the Training module, that trains the
classifier and tunes the parameters, and the Testing module, that takes in and tests all the
classifiers generated by the Training module.
In this chapter, we discuss the design decisions, as well as the architecture behind ImageMiner. ImageMiner uses a Master-Slave framework to train and test the models. Each
module uses Amazon Web Services (AWS), to get image metadata, store results, and run
each slave. In addition, we use a variety of other systems to train our classifier models and
run feature extractions. This chapter will discuss all of these systems in greater depth.
3.1
Goal
The goal of ImageMiner is to find the best set of parameters for feature extraction when
doing machine learning on and geo-tagging images. There are a few goals ImageMiner strives
to achieve:
Usability - It should be easy to understand and use.
Flexibility - Users should be able to customize inputs, such as what features to extract,
what parameters to tune, and what images to train and test on.
23
Scalability - Tuning on one hundred images should be just as easy as tuning on twenty
thousand images.
Fault Tolerance - The system should be able to deal with errors seamlessly in the background while processing images and extracting features. Ideally, the user will not even
know that an error occurred.
Speed - Users using this will be using this to determine the best set of parameters. ImageMiner should run fast enough so that it does not become a bottleneck for whatever
situation it is being used in.
3.2
Possible designs
To find the best set of parameters for feature extraction, while also meeting the goals mentioned above, we had to think carefully about the design of ImageMiner. To determine the
best set of parameters, we need to run multiple iterations of the entire machine learning
pipeline, while using a parameter tuning algorithm to generate the next set of parameters
to use and test. Each iteration has several steps that are repeated during every iteration:
1. Extract visual features from each image
2. Process the data for classification
3. Do cross-fold classification on 𝑙 folds
4. Report performance
The performance of the parameters is judged using the cross-validation accuracy of the
entire process. For a small number of images, we can easily run the entire process on one
machine. However, as the number of images and iterations increase, it becomes infeasible to
run everything on one machine.
There are two main designs we considered for ImageMiner: One is similar to PhysioMiner
that divides the problem into separate tasks and each worker runs on a subset of the data
[6] and one with a centralized database where each slave runs one iteration.
24
3.2.1
Data Parallel Design
The design of this approach is to create a master worker, which divides the problem into
several tasks. Each task is to run all iterations of ImageMiner on a subset of the images. The
master worker creates slave workers to perform each task. Each slave picks a fixed subset of
images to train and test on. They individually tune their own parameters before reporting
back to the database with its best model and corresponding set of parameters. This means
that each slave runs independently of the other slaves. Each slave produces one classifier so
ImageMiner ends up with many as many classifier as there are slaves.
A subset of this design is Noisy Parameter Tuning. For this design, instead of each slave
having a fixed subset of images to train and test on, it randomly selects a subset of images
for each iteration and reports the performance on that random subset.
3.2.2
Parallel Iteration Design
The design for this approach was to create a centralized database to store the best set of
parameters and update those parameters every time a worker runs. Each worker runs one
iteration of ImageMiner by getting the best parameter from the centralized database and
tuning on those parameters, before reporting its results back to the centralized database.
The next worker grabs the results the previous worker reported and repeat the process.
Thus, each worker is dependent on the result of the previous workers. This produces one
classifier total that is the deemed the best classifier for feature extraction.
We ended up going with the data parallel design for scalability. This design is the least
expensive since it requires the fewest calls to the database. In addition, each worker runs
independently of other workers, so if one worker goes down or produces bad results, the other
workers can compensate for that.
3.3
ImageMiner Architecture
ImageMiner is designed with a master-slave framework using Amazon AWS to communicate
between the master and the slave. The workflow is shown in Figure 3-1. There are 11 steps
25
2
SQS
ImageMiner
messages
Master
1
Location of data
Feature
Parameters
S3 bucket
DynamoDB table info
EC2
Worker
4
ImageMiner
message
6
Best
model/parameter
info
10
Best
model/parameter
info
EC2
Worker
Image
data
Training module
5
DynamoDB
User
....
EC2
Worker
3
9
Image
data
7
Training
results
S3
11
Testing
results
EC2
Worker
....
EC2
Worker
EC2
Worker
Testing module
8
Testing
messages
Testing
message
SQS
Figure 3-1: The basic system architecture for ImageMiner, showing both modules. There
are three main components: the database, the master-slave framework, and the file storage.
Please refer to the framed text in section 3.3 for more details.
to ImageMiner:
1. The user passes in the location of the images, information about the feature and
parameters, as well as the S3 bucket and DynamoDB table to use
2. The Master starts running the Training module by creating ImageMiner messages
based off the user inputs and sends those messages to Amazon Simple Queue Service
(SQS).
3. Each EC2 Worker grabs an ImageMiner message from SQS to run.
4. Using the DynamoDB table information provided by the user, each worker grabs a
set of image metadatas from the DynamoDB database to train on.
26
5. After running the Training module, each EC2 worker writes the results of the
module to another table in DynamoDB.
6. Each worker also writes the best model and the corresponding set of parameters to
Amazon Simple Storage Service (S3).
7. Once the Training module finishes running, the user prompts the Master to start
running the Testing module. The Master starts running the Testing module by
creating Testing messages that are then passed to Amazon SQS.
8. Each EC2 Worker grabs a Testing message off the queue.
9. From the DynamoDB table information provided by the user, each worker grabs of
set of images from the DynamoDB database to test on.
10. From the S3 bucket information provided by the user, each worker also grabs the
model and parameter information stored in S3 by the Training module
11. After running the Testing module, each worker then writes the results to S3.
3.3.1
Master
The user can give the master a variety of inputs, but they must provide an S3 bucket, a table
name, a file with a list of features and feature locations, a file with a list of parameters for
each feature and the bounds and default for each parameter, the number of slave workers to
create, and the IAM profile to use on Amazon AWS. The message processes the input from
the user and then uses the information to create the slave workers and populate the message
queue.
The master is responsible for creating tasks. Each ImageMiner job is split up into messages that are then put into the message queue for slave workers to read. The master can
be run on any machine (personal laptop, EC2 instance, etc.). Dividing the tasks up usually
takes less than a minute, so the master runs quite quickly. The job for the Training and
27
Testing module differ quite a bit. We will go into more detail on what each job looks like
later in the chapter.
3.3.2
Slave
The slaves are responsible for completing the tasks in the message queue. When a slave
starts running, it queries the message queue for a message. Once it receives a message, it
parses the message to extract the task, as well as any other necessary information required
to complete the task. Once the slave finishes a task, it writes the result to the database,
and deletes the message from the queue and looks for another message to grab. If there are
no more messages on the queue, the worker shuts down. Because the tasks for the Training
and Testing modules are completely different, the slaves for each module function completely
differently. We will detail how each slave functions later in the chapter.
ImageMiner runs each slave on an Amazon EC2 instance. Each instance is its own virtual
machine, with its own memory and storage and runs independently of all the other instances.
These instances are automatically created by the master, so there is no need for the user to
create them. The number of instances created by the master is inputted by the user.
3.3.3
Other Systems
ImageMiner uses a lot of different systems to run and support the Master-Slave framework.
AWS
Amazon Web Services (AWS) is the backbone of the Master-Slave framework. AWS services
are used to pass messages between the master and slaves, to store results and files, and to
run the EC2 instances for the slaves.
DynamoDB DynamoDB is used to store the image metadata that ImageMiner uses, as
well as the results from ImageMiner. It is a NoSQL database designed to be fast and scalable
and it supports hash keys and range keys for each table. A very simple API is provided for
writing to and reading from the database [6].
28
Images Table The images table stores the metadata for all the images that can be
used for training and testing the classifier and tuning the parameters. Each row represents
an image, which has a unique Image ID. Group type represents whether the image will be
used for training or testing. File path refers to the location of the file on the Internet. The
setup of the table is shown in Table 3.1.
Feature Parameters Table The feature parameters table stores the results of the
classifiers generated by the Training module. Each row represents a result and each result
has a unique ID. Each row contains the cross-training accuracy and standard deviation, as
well as the testing accuracy and the parameters used to attain these results. The setup of
the table is shown in Table 3.2.
EC2 Elastic Compute Cloud (EC2) is a scalable cloud on AWS. It provides a simple API
to start and stop virtual machines, also known as instances. These instances are used to
run the slave workers. They give the user full control over each instance and there are many
different instance types that can be created depending on the amount of RAM, CPU, and
disk space needed [6]. The default instance type for ImageMiner is r3.large, but the user
can input provide a different instance type depending on their needs. We decided to use
r3.large instances because they were relatively cheap and could easily support large amounts
of memory. Each EC2 instance is created from an "image", which is essentially the template
for every instance created from the image and provides the necessary information to launch
and run the instance. Instances created from the same image start out completely identical.
S3 Simple Storage Service (S3) is a scalable storage mechanism on AWS that allows the user
to store files. This is used to store the best model and the corresponding set of parameters
from each slave.
SQS Simple Queue Service (SQS) is a AWS messaging system. It contains a messaging
queue that is used by ImageMiner to pass messages from the master to the slave. The master
adds messages onto the queue and each slave pops messages of the queue to perform the
29
tasks. It is designed to handle concurrency as multiple workers can access the queue at the
same time [6].
Scikit-Learn
Scikit-Learn is a machine learning library in Python. It provides a variety of machine learning
tools, such as classification, regression, and clustering. For our purposes, we used ScikitLearn to perform k-means clustering on our extracted SIFT descriptors to provide a uniform
number of features for each image.
SVMLight
SVMLight is a Suppor Vector Machine (SVM) implementation written in C [8]. It provides
a variety of different SVMs for all different use cases, but ImageMiner uses SVMLight Multiclass to classify our images. This is necessary, because each cluster label is a class, so we
have 2766 different classes available to classify an image into.
VLFeat
VLFeat is an open source computer vision library [17]. VLFeat provides many different tools
for image feature extraction, as well as many other algorithms relating to image processing.
We used VLFeat’s implementation of SIFT as our feature extractor and ImageMiner tunes
the parameters of SIFT to determine the best set of parameters.
SIFT SIFT stands for Scale-Invariant Feature Transform (SIFT). The idea behind SIFT
is to find the key points of an image and compute its descriptors. These points should be
invariant to any type of scaling, orientation changes, distortion, or illumination changes.
Once the key points for each image are found and the descriptors are calculated, k-means
clustering is run on all descriptors across all images to cluster the descriptors. To get the
features of an image, a histogram of the distribution of its descriptors across the clusters is
calculated. To compare two images, the histogram for the sift descriptors are compared [11].
30
3.4
Training Module
The Training module is the module that performs the parameter tuning and trains the
classifier. It tunes the parameters for the SIFT extractor and trains classifier using those
parameters. It reports the best classifier and best set of parameters.
3.4.1
User Input
To run the Training Module, users need to specify the following items:
1. S3 bucket - The bucket to store the models and parameters from ImageMiner
2. Features Text File - A text file containing each feature’s name, along with the
location of the feature executable and the format of the output. If the output is
written to an output stream, then no format is provided.
3. Parameter Text File - A text file containing the parameters for each feature, along
with the default value for each parameter, the bounds of the parameter, and whether
the parameter is an integer or a real value.
4. Table name - The name of the DynamoDB table to write the results out to, as well
as the name of the DynamoDB table to grab the image metadata from.
The user can also choose to specify a variety of other inputs, such as the number of
training and test images to use or the number of cross-validations to do. If no input is
provided, a default value is used.
3.4.2
Master
The master receives several text files from the user. The master processes these text files
and puts the information into a ImageMiner message that goes onto the message queue.
The slave can then parse the information about the features and parameters to generate
parameter values and run the feature extraction.
The master also receives input about the S3 bucket for ImageMiner to use, as well as
the table name, and information about how to create the EC2 instances. The master uses
31
some of this information to create the EC2 instances, and passes the rest to the slaves via
the ImageMiner message.
3.4.3
Slave
Once a slave reads an ImageMiner message from the queue, it performs the following steps
as shown in Table 3-2:
1. Downloads image metadata from DynamoDB
2. Processes the metadata, downloads the image from the metadata, and converts the
image to PGM format
3. Generates parameters
(a) For iterations 1, use the default parameter values.
(b) For iterations 2-5, randomly generate parameter values.
(c) For the rest of the iterations, run a parameter tuning algorithm based off previous
results
4. Runs SIFT using the generated parameters on the PGM file
5. Process the SIFT descriptors to make suitable features for training the images.
(a) Cluster the SIFT descriptors into 100 different clusters using k-means.
(b) For each image, create a histogram of the distribution of its descriptors over the
100 clusters.
(c) The counts of descriptors in each cluster is now a feature for each image.
6. Perform cross-validation
(a) Divide the data into 𝑙 folds.
(b) Choose 𝑙-1 folds to train the classifier on
(c) Test on the remaining fold
32
(d) Repeate steps a-c until all folds have been tested on.
7. Repeat steps 3-6 for a specified number of iterations
8. Determine the best model out of all the models generated
9. Store the performance of the model in DynamoDB and the model and the set of
parameters that generated the model in S3
3.5
Testing Module
The Testing module grabs all the classifiers and parameters generated from the Training
module and tests all of them to determine the overall performance of the ImageMiner system.
It writes the result out to DynamoDB.
3.5.1
User Input
To run the Testing Module, users need to specify the following items:
1. S3 bucket - The bucket needed to access the models and parameters and to write the
predictions out to
2. Table name - The name of the DynamoDB table to grab the image metadata from
The user can also choose to provide the number of images to test on. If no number is
provided, a default value is used.
3.5.2
Master
The master is given the S3 bucket that contains the models and parameters generated from
the ImageMiner Module, as well as the table name to write results out to. The master passes
these specifications to the slave via a Testing message that goes onto the messaging queue.
33
3.5.3
Slave
Once a slave receives a Testing message, it performs the following steps as shown in Table
3-3:
1. Downloads image metadata from DynamoDB
2. Gets the set of best parameters and models from Amazon S3
3. Processes image metadata, downloads the image from the metadata, and converts the
image to PGM format
4. For each model/parameter pair
(a) Run SIFT using the give parameters on the PGM file
(b) Process the SIFT descriptors to make suitable features for training the images.
i. Cluster the SIFT descriptors into 100 different clusters using k-means.
ii. For each image, create a histogram of the distribution of its descriptors over
the 100 clusters.
iii. Each histogram is now a feature for each image.
(c) Put the results of the k-means clustering through the given classifier model
(d) Store the prediction for the classifier
5. For each image, get the predictions from all the models and write them to a file and
put that file on S3.
3.6
Parameter Tuning Algorithm
To tune our parameters, we used a parameter tuning model that used a Gaussian Copula
Process (GCP). The model, when given a list of a set of parameters previously used, as well
as previous results, generated the next set of parameters to test.
The Gaussian Copula process works like a Gaussian process, but the marginal distribution
and mappings are modified to deal with the instability of the Gaussian process and offer
greater flexibility.
34
To generate the next set of parameters to use for the machine learning pipeline, ImageMiner passed in a list of tested parameters and the performance with those parameters to
the parameter tuning model, which then uses the upper bound criteria [2] as an acquisition
function. The next set of parameters were chosen by maximizing the acquisition function on
the Gaussian Copula distribution.
For ImageMiner, the first iteration always used the default parameters supplied, while
iterations two through five used randomly generated parameters. After the fifth iteration,
the GCP Parameter Tuning algorithm was run to determine the best next set of parameters
to use.
3.7
Summary of ImageMiner
ImageMiner acts as a black box, so users can simply input a few specifications and ImageMiner will run and try to determine the best set of parameters. There are a number of goals
that ImageMiner strives to achieve to be the most helpful to users.
First, ImageMiner needs to be usable. To accomplish this, ImageMiner provides a simple
command line interface to run the system, as well as a command line help menu that describes
all the possible arguments that ImageMiner takes.
Second, ImageMiner needs to be flexible. ImageMiner allows each user to provide a
simple text file detailing what feature extraction scripts should be run and another text
file to describe the parameters that should be tuned. In addition, users can change various
aspects of the ImageMiner system by supplying a few extra command line arguments.
Third, ImageMiner needs to be scalable. To make sure this goal was achieved, we made
sure that each of the external systems that were integrated into ImageMiner were also scalable. Additionally, increasing the number of workers easily allows the user to scale the
number of images used to tune and test the classifier.
Fourth, ImageMiner needs to be fault tolerant. While building ImageMiner, we included
a variety of different error-handling methods to deal with errors every part of the system,
from converting each image to PGM to parameter generation to training the classifier.
Lastly, ImageMiner needs to be fast. This goal still is yet to be achieved, especially as
35
the number of images increases, but we are hopeful that this goal can be accomplished.
36
37
Training
Testing
220145245
245225901
1626
981
Cluster
number
image
flickr URL
image
flickr URL
File path
33.050786
52.295462
Latitude
-117.29169
13.25715
ds,game
digimax, himmel, ludwigsfelde, samsung,
sun
Longitude Tags
48018609@N00
30794983@N00
User ID
Table 3.1: The fields of the images table. Each row represents an image with a unique image id. The group type of an image
specifies whether the image will be used for training or testing. The file path is the location of the image on the Internet. The
latitude and longitude specify the location of the image. The tags are the tags that were given to the image, either by the user
or from machine tags. The user id is the id of the user who uploaded the picture.
Group
type
Image ID
BRtcONQqzQVyfKM74iEvK68
rG7PNHcKAhKwlN8okQdhir
ID
7.40778
5
20.9523596
15
CrossCrossvalidation validation
Accuracy Standard
Deviation
0
0
Test Accuracy
7.67150535
15.534482
2
3
88
1
parameter parameter parameter
1
2
3
Table 3.2: The fields of the feature parameters table. Each row represents the result from a model generated by a slave worker.
Each results has a unique ID, along with the cross-validation accuracy and standard deviation. Each row also includes the
testing accuracy and the parameter values used to obtain these results.
38
39
Image
Metadata
Process,
Download,
and Convert
Parameter
tuning from
past results
Extract
features
Training
Testing
classifier
Figure 3-2: The flow of a Image Miner Module worker
PGM image,
image label
1st
5
runs
Randomly
generate
parameters
Results
Get best model
& corresponding
set of parameters
Image
Metadata
Process,
Download,
and Convert
Get
parameters
PGM image,
image label
Extract
features
Test
classifier
Get models
Results
Get majority
results from
models
Figure 3-3: The flow of a Testing Module worker
Final predictions
40
Chapter 4
Experiments and Results
We ran a couple different experiments to test the performance of ImageMiner. Our first
experiment was run with 5 workers with 20 images per worker. The experiment also used 10
iterations with 10 cross-validations in each iteration. The Testing module for this experiment
ran with 4 workers with 13 test images per worker on 5 different models. The second
experiment we ran was run with 50 training workers with 200 images per worker with 10
iterations and 10 cross-validations per iteration. The Testing module ran on 100 workers with
250 images per worker on 67 models, generated from the Training module. The experiments
are also described in Table 4.1.
The results from our first experiment are shown in Table 4.2. The results of the Testing
module are shown in Table 4.3.
Given the small number of images per worker and the large number of clusters, it’s
no surprise that the testing results are poor. However, the cross-validation accuracies are
much better, albeit with extremely large standard deviations. This makes sense because the
workers pick the best classifier, so it is not surprising, given 10 iterations, that one of the
iterations correctly classified more than a couple images.
The results from our second experiment are shown in Appendix B, while a graph of the
results are shown in Figure 4-2. The results from the Testing module are shown in Appendix
B.
The results from the Testing module are not great, with each model having less than a
41
1% accuracy. Given the number of total clusters, even a 0.1% to 0.9% accuracy is actually
more than 3 times better than random guessing. However, the majority predictions actually
have the worst accuracy of 0.008%, which is even worse than random guessing. Looking at
the predictions, because of the large number of images in cluster 1, many of the models end
up predicting, incorrectly, that an image is in cluster 1, so the majority prediction for nearly
all the images is cluster 1, which is incorrect most of the time.
Looking at the per-class accuracy, unsurprisingly, cluster 1 is predicted correctly more
than 10 times as often as the next cluster. Only 10% of the clusters had a non-zero accuracy,
which is lower than we would have preferred, but given the number of predictions for cluster
1, not unsurprising.
GCP Accuracy
2
1.5
1
0.5
0
1
2
3
4
5
6
7
8
9 10
GCP Iteration
Figure 4-1: Performance of the models generated by the Training module at each iterations.
Figure 4-1 shows the average accuracy of all the models for each of the 10 iterations.
Interestingly, the first iteration, where ImageMiner used the default feature values, performed
the best. The iterations with randomly generated parameters performed much worse, which
was expected, while each of the iterations that used parameter tuning performed better than
each of the iterations that generated random parameters. There is a general upward trend
for each iteration that uses parameter tuning, which ends with the last iteration producing
the best results.
Looking at the graph, it is encouraging to see that each iteration of the parameter tun42
ing algorithm produces better or comparable results to previous iterations. However, from
the tests, it seems that using the default parameters actually produce the best results. It
would be interesting to run experiment with more iterations to see if the performance of the
parameter tuning algorithm can eventually overtake the performance of the classifier using
Cross-validation Accuracy
the default parameter values.
50
40
30
20
10
0
1 10 20 30 40 50 60 70 80 90 100
Model
Figure 4-2: Performance of each of the classifiers generated by ImageMiner.
Figure 4-2 shows the cross-validation accuracy and standard deviation of each classifier
generated by ImageMiner. Most of the models have an accuracy between 0% and 5%,
although a few models perform especially well, with accuracies over 10%. Although 5%
acccuracy appears to be fairly low, it is important to note that there are 2766 different
classes, so a 5% accuracy is significantly better than randomly guessing.
For more in-depth results, please refer to Appendix B.
Many related studies we looked at had much higher accuracies, ranging from 20% all the
way up to over 90%. In comparison, ImageMiner looks much worse. However, it is important
to remember that ImageMiner is first and foremost an architecture to improve parameter
selection. The results produced are mainly to make sure the architecture is in place.
Although a 5% accuracy is relatively low compared to the other studies we looked at,
ImageMiner only used 1 feature, SIFT, while other studies used at least three features.
In addition, using tags as a feature, which we do not do, produced the best results. The
43
performance of studies using only image features was only around 20% to 25%. Looking at
our results in this light, 5% is not as poor as we initially believed.
44
45
5
50
Experiment 1
Experiment 2
200
20
Images per
worker
10
10
Number of
iterations
10
10
Number of crossvalidations
100
4
Number
of testing
workers
250
13
Images per
worker
Table 4.1: A summary of the two experiments we ran with ImageMiner.
Number
of training
workers
67
5
Number of
models
Table 4.2: Results for Experiment 1
Worker
Cross-Validation Accuracy
Cross-Validation STD
1
25
38.18813079
2
22.22222222
34.24674446
3
5.555555556
15.71348403
4
12.5
21.65063509
5
5
15
Results for experiment 1 with 5 workers, 20 images per worker,
10 iterations and 10 cross-validations
Table 4.3: Results for Test 1
Worker
Number correct
Number of total images
1
0
13
2
0
13
3
0
13
4
0
13
Results for the Testing Module test 1 with 4 workers
and 13 test images per worker with 5 different models.
46
Chapter 5
Experience of designing the system
We dealt with a variety of challenges during this project while building ImageMiner that
ended up limiting the amount of time available for experiments. These challenges ranged
from issues that had to be resolved before I could even start building the architecture to
processing results from ImageMiner and everything in between. we will discuss some of the
challenges I faced below.
5.1
Data pre-processing and preparation
I dealt with several challenges before even writing ImageMiner.
The first issue was that once I had obtained the image metadata, I needed to cluster the
images. Since our goal was to cluster images into clusters of 200km or less, using k-means
clustering was impractical. This is because k-means does not set a limit on how large or
how spread out the cluster is. In addition, to use k-means one needs to know the number of
clusters they want to create, which is borderline impossible for our case. My first solution
was to put all the images in one cluster and then find the two images furthest apart and
divide the cluster in two based off which cluster the image was closer too. However, this
took too long because finding the two images furthest apart in a cluster required multiple
passes through all the images in that cluster. To solve this problem, I realized I just needed
to find two images that were greater than 200km apart to divide the cluster. I ended up
modifying my clustering algorithm to find two images that were 500km apart to divide the
47
cluster. This greatly increased the runtime of the clustering algorithm.
The other challenge I had to deal with was writing the image metadata to the database.
Although I had the image metadata, it was in one big text file that would be unwieldy to
section off to different workers. To solve this problem, I added each image’s metadata to a
DynamoDB database. This way each worker can select its own list of images to train and
test on. Writing all 825659 images took several days, which decreased the amount of the
time I could experiment and test ImageMiner.
5.2
ImageMiner Architecture
I also had to plan out how I wanted ImageMiner to run from start to finish. This meant
everything from the user inputs allowed to the types of messages that were passed to the
worker to how the worker processed the message. Each step was related to the other and I
had to carefully plan out how I wanted the output from one step to affect the input to the
next step.
Along the lines of user inputs, one problem I had to tackle was how to pass in feature
extractor and parameter information to ImageMiner. For the feature extractor, ImageMiner
needed to know what to call in the command line to run the feature extractor1 where to
download the feature extractor and what the output of the feature extractor was. The
parameter information had to include the feature and parameter names, the default value,
the bounds on parameter values, and the type of the parameter (integer or float). My
solution was to have the user pass in two files: one for feature information and one about
the parameters. Each line in the file was a new feature or parameter and the inputs on
each line were tab-separated. ImageMiner would read in and process each file and store the
information and pass it onto the worker to use.
Once the Training module finished running, I had to figure out how to pass the resulting
models and the corresponding set of parameters to the Testing Module. The easiest way
to do this was to write the model, which was already a file, and the corresponding set of
parameters, which needed to be written to a file, to Amazon S3. Each model and parameter
1
Since that was how to run VLFeat’s version of SIFT
48
would have a hash of the images used to test and train on in their filename so the Testing
module would know how to link each model to the correct set of parameters. The Testing
module would then grab the models and parameters and process the file to use for testing.
5.3
System issues
After the ImageMiner system was built, I ran into several issues dealing with the underlying
systems that ImageMiner was using.
5.3.1
Installing libraries
The first challenge was to make sure that SVMLight, the SVM classifier ImageMiner was
using, and SIFT, the feature extractor, ran on the EC2 instances that each worker was
running on. I had been testing on my local machine, so I made sure the SVMLight and
SIFT executables ran on my 64-bit Mac laptop. However, each EC2 instance ran on a Linux
Operating System, so I had to download different executables for SVMLight and SIFT to
ensure that both would run on the EC2 instances.
Another issue was that the instances I originally used did not have many of the libraries
I needed installed. For example, I was using Python’s sklearn library for my k-means clustering. However, the EC2 instances I was creating did not have that library installed. My
first solution was to attempt to have each worker attempt to install sklearn via the command
line while running the program, but that ran into its own set of problems. What I ended
up doing, instead, was to create a new image, similar to the one I were using already but
with the necessary libraries installed. I then changed the image I was using to create the
instances to my own created image.
5.3.2
Instance limits
One of the biggest issues I ran into was the limit on number of instances I could create. I
were limited to 100 r3.large EC2 instances, which meant severely slowed down the runtime
of ImageMiner, especially the Testing module. I looked into other types of instances to run
49
on that had a significantly higher limit. However, those types were more expensive and also
required some additional setup, which I did not have time to do, so I ended up sticking with
the r3.large EC2 instances. Since I were limited to 100 instances, I could only create a few
instances at a time for my Testing module, which meant that each instance had to train
and test on thousands of images instead of hundreds of images. The other option was to
wait for all the Training slaves to finish before launching my Testing slaves. This way each
slave could run on fewer images and run faster. The problem with this is that if there is a
particularly slow ImageMiner instance, it can delay the whole project. I ended up waiting
for all the Training slaves to run before launching my Testing slaves, so that each Testing
slave would run faster.
5.4
Integrating Python Libraries
ImageMiner also uses several python files and libraries to extract features and generate
parameters. I had to figure out how to best connect the different Python and Java files and
pass information between each. I considered using a java library called Jython, which allows
Python to run on Java. However, I felt that this required too much setup work. What I
did instead was I used Java to call the command line to run the python files and passed in
the necessary information as arguments. The Python file would receive the command line
arguments, process them, and then use them to run its program. It would then print out
the results to the command line. Meanwhile, the Java program would wait for the Python
program to print to the command line and read in the output as it was sent to the command
line.
Another problem with the python files was that they would occasionally run into errors
that were uncontrollable on my end. I had to make sure that these errors did not interrupt
the module each worker was running. I also had to decide how to properly deal with each
error. In some cases, such as with the parameter generation, I would simulate a possible
output from the python file. In other cases, I would just ignore the error and keep on running.
50
5.5
Feature Extraction
A critical component of the SIFT feature extraction was that the images had to be in PGM
file. However, the URLs provided for each image downloaded each image as a JPG file, so I
had to figure out how to convert each JPG file to a PGM file. After I had written code to
do just that, I ran into an issue where my ImageMiner jar file could not find the library that
I were using to convert each JPG image to PGM. I realized that the library I was using was
not on the build path, so it was never added to the jar file when it was created, so after I
did that that problem was solved.
I ran into another issue where sometimes the jar file could not download the JPG image
from the URL or performing SIFT on the PGM image caused some sort of error. I had
to handle these errors, but also make sure the worker knew that that image was no longer
usable to train or test on.
Once the SIFT descriptors were extracted, I needed to run k-means clustering on the
descriptors so that each image would be described by the cluster of its descriptors, which
would make the images and features much easier to test on. However, I ran into several bugs
while writing the k-means clustering code.
One issue I dealt with was that a few SIFT descriptors only had 2 or 3 numbers, while
the rest had 132 numbers. This would result in sklearn being unable to cluster the images,
because the python was unable to turn the resulting array of descriptors into an numpy
array, because the descriptors were not all the same length. I dealt with that by removing
any descriptors that did not have 132 numbers, which fortunately were not too many.
Sometimes, removing these descriptors or processing SIFT files with very few descriptors
would result in there being fewer descriptors than the desired number of classes, which was
an issue. I remedied this by lowering the number of classes to the number of descriptors, if
this happened.
51
5.6
SVM Classifier
One issue with the SVMLight classifier was that occasionally the training executable would
take an extremely long time to run, or never finish running. This meant that when the
testing executable ran, it could not find the appropriate model to test on and throw an
error. In this cases, I would assume an accuracy of 0% for that test.
5.7
Processing results
Once the modules finished running, I had to process the results and put them in a format so
that they could be easily read and displayed in a graph. Some of that was relatively easily.
The performance of the classifiers were stored in DynamoDB, so I just had to simply query
the database and process the results into a CSV file.
Others were more complicated. To get the results of each iteration of the parameter
tuning, I had to manually log into each worker and look through the output file to get the
accuracy at each iteration.
To get the results of the Testing Worker, the predictions for each model and each worker
was written out to a file and stored on Amazon S3. However, when trying to download this
data, I ran into a Java OutOfMemoryError due to the lack of heap space. I dealt with this
by increasing the size of the memory allocation pool using the ’-Xmx’ option, as well as
modifying my code to write out the results to a file after the predictions of a certain number
of images were downloaded and manually combining the results.
5.8
Miscellaneous
One of the biggest issues with ImageMiner was the run-time of the two modules. Because
the run-time for each was very slow (2-4 days for the Training module with 200 images per
worker, 1 day for the Testing module with 250 workers), it was difficult to debug and it often
took a day or two to discover a bug during run-time, which would delay the next steps by a
day. Luckily, I was able to discover many of the bugs before too much time had passed so
the amount of wasted time was minimized.
52
In addition, because each module took so long to run and was such a big system, testing
individual portions, like the parameter generation or the k-means clustering, was difficult
and impossible to test within the system. I had to write code specifically designed to test
each part before integrating it into the system. Of course, once I integrated each part in, the
transition between each part would occasionally malfunction, but that was easier and faster
to debug and fix.
Another minor issue that came up was that since I had stored all the executables and
python files on Dropbox, when I launched 100 instances at once, I generated so much traffic
downloading the executables and python files that the links were temporarily suspended. I
dealt with this by moving the Python files to Amazon S3, so that there would be less traffic
going to Dropbox.
5.9
Lessons Learned
In addition to all the problems listed above, I had to deal with the typical programming
problems, such as deciding on the best algorithm to use to solve a certain problem or debugging an error caused by a typo. Although dealing with all these errors was extremely
frustrating, at times, it was a great learning experience and I am glad to have been able to
work through them.
I learned a lot while building ImageMiner. For example,
∙ Get the data pre-processing step done as soon as possible.
∙ Make sure to design the system really well before building it
∙ When building a system, it helps a lot to use other people’s contributions but do not
be scared to customize your own things.
∙ Testing on a local machine and testing on the cloud are completely different.
I wish I had known all this beforehand, but I am very grateful to have learned all these
lessons.
53
54
Chapter 6
Conclusion
6.1
Future Work
There is a lot of future work left for ImageMiner.
First, due to time constraints, we were not able to perform as many experiments as
we would have liked. More experiments with different test cases should be run to better
determine the effectiveness of ImageMiner and parameter tuning.
One future test to run, is to increase the number of iterations to see if the parameter
tuning algorithm will eventually produce better results than the default parameter values.
It would also be good to experiment with the clustering algorithm used. Currently, a
large number of the images are clustered into a few clusters, which could skew the results of
the Training and Testing modules. Modifying the algorithm so each cluster has roughly an
equal number of images could improve performance, as well.
In addition, there are many issues that can be improved. Some of the issues were touched
on earlier, but we will go into more detail here:
6.1.1
User Flexibility
Currently, ImageMiner is only designed to handle SIFT feature extraction. Since all of the
feature processing is done by ImageMiner, it requires the features are outputted in a certain
format. One possible solution is to have the user to write the code to process the features
55
themselves and allow them to insert to run within ImageMiner. This will allow features to
outputted in any format the user desires and allow ImageMiner to deal with a variety of
features.
Additionally, ImageMiner can only handle one feature at a time. It would be ideal if
the user could input multiple features and have ImageMiner tune the parameters for all the
supplied features. There are several possible solutions, but the easiest way to do this would
be to modify the code to process and tune each feature one-by-one, instead of trying to
extract and tune all the provided features at once. ImageMiner could also create specific
workers for each feature so that each worker only has to tune one feature.
To increase flexibility, ImageMiner should be able to handle a variety of feature extractors
and be able to handle multiple features at once.
6.1.2
Speed
Although ImageMiner does test a variety of different parameters and produces a best prediction model and set of parameters, the code runs fairly slowly. An ImageMiner EC2 instance
running on 200 images takes anywhere from 1 to 5 days. Improving the speed of ImageMiner
would greatly increase the usability of ImageMiner. The bottleneck for ImageMiner comes
from extracting the features and running k-means on the SIFT descriptors. Finding a faster
way to extract and process the feature would greatly increase the speed of ImageMiner.
6.2
Future Goals
The goal of ImageMiner is to be able to tune parameters values for feature extraction to
extract the best features possible when running a machine learning algorithm. Although
there is still room for improvement, we hope that ImageMiner can help researchers extract
better features and obtain better results.
56
Appendix A
ImageMiner Interface
This chapter will discuss how a user interacts with ImageMiner. This will provide a guide to
how ImageMiner works. The source code is here: https://github.mit.edu/ALFAGroup/
DeepMining.Images/tree/master/ImageMiner.
The user should first either build the jar from the source code or download the JAR from
https://github.mit.edu/ALFAGroup/DeepMining.Images/tree/master/ImageMiner. The
user also needs JRE 7 and an AWS account to get started. Make sure the AWS account
credentials can be found by the jar. If not, the user should run
export AWS_SECRET_ACCESS_KEY = xxxxxxxx
and
export AWS_ACCESS_KEY_ID = xxxxxxxx
to set their AWS credentials in the command line.
A.1
Training
The Training module determines the best parameters to use for feature extraction and creates
a classifier off those parameters. It accepts the following command line arguments:
An example command for running the Training module might look like
57
-a,–ami <arg>
Amazon Machine Image to launch instances with
(default is the public ImageMiner AMI)
-b,–bucket <arg>
Bucket to upload models to
-d,–num-iterations <arg>
Number of iterations for each worker
-f,–features <arg>
File containing feature extraction
scripts. This file should contain a
row for each feature in the format
<feature_name> <dropbox_url_of_script>
<output_file_format(s)>. There can be
multiple output_file_formats. The output
file format should contain the type of format
for the output file(s) (.doc, .txt, etc). If
no output file format is specified, then it is
assumed the output is written to stdout.
-g,–tag <arg>
Each instance will be tagged with this value.
-h,–help
Show the help menu.
-i,–initialize-tables
Pass this argument to initialize the tables
for the first time. If tables with the given
table name already exist, they will be deleted
and recreated.
-k,–num-train <arg>
Number of images to train on
-l,–num-cross <arg>
Number of cross-validations
-m,–num-test <arg>
Number of images to test on
-n,–num-instances <arg>
Number of instances to create
-p,–iam-profile <arg>
Name of IamProfile to use for EC2 instances
(must have access to DynamoDB, S3, and SQS)
-r,–parameters <arg>
File containing parameter info. This file
contains a row for each parameter in the
format ’<feature> <param_name> <default_value>
<lower_bound>,<upper_bound> 0(for integer only
params) or 1(for real numbered values)’
-t,–table <arg>
Name of table to store in the database (names
will be ’<table_name>_feature_parameters’)
-y,–instance-type <arg>
Amazon EC2 Instance type to launch,
(possibilities: r3.large, r3.xlarge,
r3.2xlarge, r3.4xlarge, r3.8xlarge, c3.large,
c3.xlarge, c3.2xlarge, c3.4xlarge, c3.8xlarge)
58
java beatDB/Main imageminer -b imageminer -d 10 -f features.txt -k 8000 -l 10 -m
2000 -n 50 -p beatdbtestrole -r parameters.txt -t testing
This will create 50 r3.large instances using the public ImageMiner AMI. The feature and
parameter infos are stored in features.txt and parameters.txt respectively. Each worker will
grab 200 (160 training, 40 testing) images from the images table to train and test on. Each
worker will run 10 cross-validations on the training data and run 10 iterations to produce
10 models. It will pick the best model and write the model and the corresponding set of
parameters to S3 and write the performance of the best model to the feature_parameters
table.
A.2
Testing
The Testing module grabs all the models and parameters generated by the ImageMiner
module and tests the models to determine how good the modules are. It takes the following
command line arguments:
An example command for running the Testing module might look like
java beatDB/Main testing -b imageminer -m 5000 -n 50 -p beatdbtestrole -t testing
This creates 50 r3.large instances using the public ImageMiner AMI. It downloads the
module and parameters from the imageminer bucket on S3. Each worker then grabs 100
images to test on and for each model-parameter set, extract features based on the parameter
and runs a prediction using the model. The worker then gets every model’s prediction
for each image and does majority rules to determine what the final classification is before
comparing it to the actual classification.
59
-a,–ami <arg>
Amazon Machine Image to launch instances with
(default is the public ImageMiner AMI)
-b,–bucket <arg>
Bucket to grab models and parameters from
-g,–tag <arg>
Each instance will be tagged with this value.
-h,–help
Show the help menu.
-i,–initialize-tables
Pass this argument to initialize the tables
for the first time. If tables with the given
table name already exist, they will be deleted
and recreated.
-m,–num-test <arg>
Number of images to test on
-n,–num-instances <arg> Number of instances to create
-p,–iam-profile <arg>
Name of IamProfile to use for EC2 instances
(must have access to DynamoDB, S3, and SQS)
-t,–table <arg>
Name of table to store in the database (names
will be ’<table_name>_test_results’)
-y,–instance-type <arg> Amazon EC2 Instance type to launch,
(possibilities: r3.large, r3.xlarge,
r3.2xlarge, r3.4xlarge, r3.8xlarge, c3.large,
c3.xlarge, c3.2xlarge, c3.4xlarge, c3.8xlarge)
60
Appendix B
Results
The following table displays the cross validation accurracy and standard deviation of the
models generated from the Training module.
Table B.1: The performance of each model generated from the ImageMiner module
Model
Crossvalidation Accuracy
Crossvalidation STD
1
0.625
1.875
2
1
3
3
1
3
4
1
3
5
1.111
3.333
6
1.111
3.333
7
1.234444444
3.491536151
8
1.234444444
3.491536151
9
1.25
3.75
10
1.25
3.75
11
1.25
3.75
12
1.25
3.75
13
1.25
3.75
14
1.25
3.75
15
1.25
3.75
16
1.25
3.75
17
1.25
3.75
61
Table B.1 – continued.
Model
Crossvalidation Accuracy
Crossvalidation STD
18
1.429
4.287
19
1.429
4.287
20
1.667
5.001
21
2
4
22
2
4
23
2
6
24
2
6
25
2
6
26
2
6
27
2
4
28
2.111
4.22928942
29
2.111
4.22928942
30
2.222
6.666
31
2.222
4.444
32
2.222
4.444
33
2.222
4.444
34
2.361
4.73221819
35
2.5
7.5
36
2.5
7.5
37
2.5
7.5
38
2.5
5
39
2.5
5
40
2.679
5.372929276
41
2.777777778
5.196746371
42
2.777777778
5.196746371
43
2.777777778
5.196746371
44
2.857
8.571
45
2.857
4.738552627
46
2.858
5.716
47
2.976666667
5.584792844
48
3
4.582575695
49
3.241111111
6.14270741
50
3.25
6.712860791
62
Table B.1 – continued.
Model
Crossvalidation Accuracy
Crossvalidation STD
51
3.25
6.712860791
52
3.333
5.091241597
53
3.333
9.999
54
3.333
5.091241597
55
3.333
5.091241597
56
3.333
7.113871028
57
3.333
9.999
58
3.429
5.353724778
59
3.651
5.63711176
60
3.703333333
10.47460845
61
3.75
8.003905297
62
3.75
5.728219619
63
3.75
8.003905297
64
3.75
5.728219619
65
3.75
8.003905297
66
3.75
5.728219619
67
3.929
6.019416002
68
3.929
8.21482617
69
3.929
8.21482617
70
4.166666667
5.89255651
71
4.166666667
8.333333333
72
4.286
12.858
73
4.287
6.548500668
74
4.287
6.548500668
75
4.287
6.548500668
76
4.444
5.442766208
77
4.444
5.442766208
78
4.5
7.141428429
79
4.5
7.141428429
80
4.583
7.505808484
81
4.583
10.2815369
82
5
6.123724357
83
5
15
63
Table B.1 – continued.
Model
Crossvalidation Accuracy
Crossvalidation STD
84
5
8.291561976
85
5
15
86
5
6.123724357
87
5
8.291561976
88
5
15
89
5
15
90
5
11.45643924
91
5
6.123724357
92
5.555555556
15.71348403
93
5.833
11.8137632
94
6.217777778
7.04466824
95
6.429
15.13566546
96
7.407777778
20.9523596
97
7.5
10
98
8.691
11.78542443
99
10
30
100
10
30
101
10
30
102
10
30
103
10
30
104
15
32.01562119
The following table displays the test accuracy of the models generated from the Training
module on test data.
Table B.2: Accuracy of each model from the Testing module
Model
Test Accuracy
1
2
3
0.0
0.12650995755794972
0.11128972424879437
4
5
6
7
8
9
64
0.16661112962345886
0.6896833170094678
0.2226069750185506
0.37136793992817496
0.7876265099575579
0.12242899118511265
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
0.11834802481227553
0.5523723154293252
0.041218416388442355
0.049464138499587806
0.0741992662517004
0.07831499113804048
0.7060071825008162
0.09386222657525302
0.14283382304929806
0.08161932745674175
0.07421150278293136
0.3998186389678909
0.11834802481227553
0.1030715316429602
0.7753836108390467
0.15099575579497226
0.21629121776036564
0.11426705843943846
0.7794645772118838
0.15915768854064644
0.35096310806398956
0.1566622691292876
0.5081639453515495
0.3789456150578829
0.2061430632859204
0.7625721352019786
0.1305909239307868
0.17316017316017315
0.04946209966613083
0.3297609233305853
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
majority
0.044890630101207966
0.06594946622150777
0.21221025138752855
0.6692784851452824
0.08570029382957885
0.05359719645433931
0.12650995755794972
0.3795298726738492
0.5305256284688215
0.267941794797807
0.10610512569376428
0.1850917045263335
0.1401772830344259
0.5958210904342148
0.16079158936301793
0.11834802481227553
0.38769180541952336
0.15507672216780935
0.5468494939601698
0.820246486130003
0.7998694090760692
0.21846661170651277
0.0948258091115234
0.07753836108390467
0.30093165141396655
0.3131954174565235
0.5672543258243552
0.12202819272038713
0.008202742409402547
The following table shows the per-class accuracy for each class from testing the models
on the test data. Only clusters with non-zero accuracies were shown. Only 274 out of the
2766 clusters had non-zero accuracies.
Table B.3: Per-class accuracy for each cluster from the Testing module
Cluster
Test Accuracy
1
2
93
23.92312789927104
2.0620179442782938
1.6635859519408502
545
2071
173
16
294
65
1.596351197263398
1.4925373134328357
1.4796547472256474
1.478743068391867
1.476510067114094
1461
679
442
314
737
1278
306
1765
195
21
120
67
28
1962
2477
1915
361
1315
1171
1441
2640
75
13
136
590
1602
373
999
1498
33
14
139
562
1674
1098
2095
995
2628
94
58
220
43
2149
40
115
1.4705882352941175
1.3513513513513513
1.3084112149532712
1.3054830287206265
1.2048192771084338
0.9852216748768473
0.9358288770053476
0.8902077151335311
0.7944389275074478
0.7875953728771843
0.7619047619047619
0.7541995200548508
0.7501071581654523
0.7462686567164178
0.7462686567164178
0.7407407407407408
0.7393715341959335
0.7380073800738007
0.7371007371007371
0.7352941176470588
0.7352941176470588
0.6947660954145438
0.6730137885751806
0.6260671599317018
0.6157635467980296
0.60790273556231
0.5988023952095809
0.591715976331361
0.5555555555555556
0.5286529921759358
0.5085464048594435
0.49813200498132004
0.4975124378109453
0.49504950495049505
0.49261083743842365
0.49261083743842365
0.49019607843137253
0.49019607843137253
0.46982291290205996
0.46748831279218017
0.431832202344232
0.4244282008960151
0.42194092827004215
0.412829469672912
0.4108885464817668
7
419
486
285
26
1876
1225
835
420
649
824
943
2405
1272
1365
2197
36
509
125
565
128
27
221
323
2339
1509
828
1671
2410
570
1095
1875
2292
714
1183
1245
1404
50
644
217
31
104
62
289
593
66
0.4089775561097257
0.40540540540540543
0.4037685060565276
0.4016064257028112
0.3923766816143498
0.38314176245210724
0.37174721189591076
0.3703703703703704
0.36900369003690037
0.36900369003690037
0.36900369003690037
0.36900369003690037
0.36900369003690037
0.3676470588235294
0.3676470588235294
0.3676470588235294
0.36322360953461974
0.3401360544217687
0.3336510962821735
0.32786885245901637
0.32377428307123035
0.3114658360911038
0.3110419906687403
0.31007751937984496
0.3048780487804878
0.3003003003003003
0.2958579881656805
0.2958579881656805
0.2958579881656805
0.2949852507374631
0.2949852507374631
0.2949852507374631
0.2949852507374631
0.29411764705882354
0.29411764705882354
0.29411764705882354
0.29411764705882354
0.27165710836100215
0.2684563758389262
0.2682763246143528
0.2658396101019052
0.25665704202759065
0.2548853016142736
0.24968789013732834
0.24630541871921183
1246
181
156
365
471
1392
5
304
23
20
86
817
11
163
210
34
378
454
300
532
658
363
214
346
275
553
865
673
726
1114
84
4
161
1253
1342
315
1082
331
460
232
445
580
71
166
3
0.2457002457002457
0.24554941682013504
0.24549918166939444
0.24509803921568626
0.24509803921568626
0.24509803921568626
0.23443910444262106
0.2288329519450801
0.22742779167614283
0.22711787417669774
0.22361359570661896
0.2178649237472767
0.21671407287010702
0.21398002853067047
0.21261516654854712
0.21136683889149835
0.21119324181626187
0.21097046413502107
0.21052631578947367
0.21052631578947367
0.21052631578947367
0.2103049421661409
0.20147750167897915
0.19685039370078738
0.1926782273603083
0.18832391713747645
0.1851851851851852
0.18484288354898337
0.18484288354898337
0.18484288354898337
0.18475750577367206
0.1846892603195124
0.18450184501845018
0.18450184501845018
0.18450184501845018
0.1841620626151013
0.1838235294117647
0.17436791630340018
0.1737619461337967
0.17301038062283738
0.16366612111292964
0.16366612111292964
0.1596169193934557
0.15337423312883436
0.15293442936341045
144
171
514
97
242
80
754
140
149
1019
1068
567
608
6
101
74
347
8
250
186
333
353
87
19
142
63
105
615
35
462
286
324
272
398
10
266
29
45
612
230
264
334
370
1088
150
67
0.1525165226232842
0.15236160487557138
0.15037593984962408
0.14829461196243204
0.14814814814814814
0.14787430683918668
0.14771048744460857
0.14756517461878996
0.14749262536873156
0.14749262536873156
0.14749262536873156
0.14705882352941177
0.14705882352941177
0.14513788098693758
0.14238253440911247
0.14231499051233396
0.1402524544179523
0.14023457419683832
0.13568521031207598
0.13513513513513514
0.13513513513513514
0.13513513513513514
0.13452914798206278
0.1306701512040321
0.13054830287206268
0.1303780964797914
0.13009540329575023
0.125
0.12315270935960591
0.12300123001230012
0.12285012285012285
0.12285012285012285
0.1226993865030675
0.12254901960784313
0.12181916621548457
0.11785503830288745
0.11693171188026193
0.11574074074074073
0.11441647597254005
0.11402508551881414
0.11376564277588168
0.11376564277588168
0.11350737797956867
0.11312217194570137
0.10911074740861974
531
329
169
54
79
842
527
317
349
309
212
621
188
876
32
291
207
240
257
274
49
22
280
423
197
350
59
227
308
82
12
60
24
406
219
284
107
262
25
138
15
260
216
0.10660980810234541
0.10504201680672269
0.10183299389002036
0.10131712259371835
0.10090817356205853
0.09871668311944717
0.09861932938856016
0.09842519685039369
0.09842519685039369
0.09832841691248771
0.09828009828009827
0.09823182711198428
0.09813542688910697
0.09803921568627451
0.09420631182289213
0.09250693802035154
0.09216589861751152
0.09208103130755065
0.09191176470588235
0.09191176470588235
0.09078529278256922
0.08976660682226212
0.08703220191470844
0.08673026886383348
0.08665511265164644
0.08650519031141869
0.08389261744966443
0.08278145695364239
0.08244023083264633
0.08240626287597858
0.08119519324455993
0.0790722192936215
0.07791195948578107
0.07776049766718507
0.07739938080495357
0.07446016381236038
0.07390983000739099
0.07390983000739099
0.07316627034936894
0.07102272727272728
0.07023705004389816
0.0702247191011236
0.0700770847932726
30
175
296
53
213
193
251
218
126
38
162
52
92
141
98
88
151
131
102
153
143
154
37
44
47
108
95
159
119
168
89
69
9
64
118
46
65
57
56
42
39
17
51
68
0.06944444444444445
0.0675219446320054
0.0675219446320054
0.06702412868632708
0.06697923643670461
0.06693440428380187
0.06150061500615006
0.06146281499692685
0.05945303210463733
0.05732301519059903
0.05694760820045558
0.05688282138794084
0.05672149744753262
0.05665722379603399
0.0546448087431694
0.05279831045406547
0.051150895140664954
0.04962779156327543
0.04940711462450593
0.049164208456243856
0.04906771344455348
0.047709923664122134
0.047505938242280284
0.04691531785127844
0.046264168401572985
0.046125461254612546
0.04601932811780948
0.0447427293064877
0.044722719141323794
0.04342162396873643
0.04230118443316413
0.04106776180698152
0.03756574004507889
0.03607503607503607
0.03594536304816679
0.03506311360448808
0.03428179636612959
0.030759766225776686
0.027932960893854747
0.026281208935611037
0.02382654276864427
0.0209819555182543
0.020475020475020478
Bibliography
[1] MediaEval Multimedia Benchmark. Mediaeval benchmarking initiative for multimedia
evaluation. http://www.multimediaeval.org/mediaeval2014/placing2014/.
[2] Eric Brochu, Vlad M Cora, and Nando de Freitas. A tutorial on bayesian optimization
of expensive cost functions, with application to active user modeling and hierarchical
reinforcement learning. eprint arXiv:1012.2599, arXiv.org, December 2010.
[3] L. Cao, J. Yu, J. Luo, and T. Huang. Enhancing semantic and geographic annotation of
web images via logistic canonical correlation regression. ACM International Conference
on Multimedia, pages 125–134, 2009.
[4] J. Choi, H. Lei, V. Ekambaram, P. Kelm, L. Gottlieb, T. Sikora, K. Ramchandran,
and G. Friedland. Human vs. machine: Establishing a human baseline for multimodal
location estimation. ACM International Conference on Multimedia, pages 866–867,
2013.
[5] Seth Golub. heatmap.py. http://www.sethoscope.net/heatmap/.
[6] Vineet Gopal. Physiominer: A scalable cloud based framework for physiological waveform mining. Master’s thesis, MIT, 2014.
[7] J. Hays and A. A. Efros. Im2gps: Estimating geographic information from a single
image. CVPR Computer Vision and Pattern Recognition Conference, 2008.
[8] Thorsten Joachims. Svmlight. http://svmlight.joachims.org.
[9] P. Kelm, S. Schmiedeke, J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran, and
T. Sikora. A novel fusion method for integrating multiple modalities and knowledge
for multimodal location estimation. ACM Multimedia Workshop on Geotagging and Its
Applications in Multimedia, 2013.
[10] M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J.F. Jones. Automatic tagging and geotagging in video
collections and communities. ACM ICMR International Conference, pages 51–54, 2011.
[11] David G. Lowe. Object recognition from local scale-invariant features. Computer Vision,
2:1150–1157, 1999.
69
[12] Frank Michel. How many public photos are uploaded to flickr every day, month, year?
https://www.flickr.com/photos/franckmichel/6855169886/in/photostream/.
[13] OpenStreetMap. Osm viz. http://cbick.github.io/osmviz/html/index.html.
[14] O. A.B. Penatti, L. T. Li, J. Almeida, and R. da S. Torres. A visual approach for video
geocoding using bag-of-scenes. ACM ICMR International Conference on Multimedia
Retrieval, 2012.
[15] David A. Shamma.
One hundred million creative commons flickr images
for
research.
http://yahoolabs.tumblr.com/post/89783581601/
one-hundred-million-creative-commons-flickr-images.
[16] M. Trevisiol, H. Jégou, J. Delhumeau, and G. Gravier. Retrieving geo-location of videos
with a divide and conquer hierarchical multimodal approach. ACM ICMR International
Conference on Multimedia Retrieval, 2013.
[17] VLFeat. Vlfeat. http://www.vlfeat.org/.
70