Document

advertisement
Associative Data Schemes for Cloud Computing
Amir Basirat
PhD Candidate
Amir.Basirat@monash.edu
Supervisor: Dr Asad Khan
Clayton School of IT, Monash University
STINT Workshop, Lulea, Sweden - May 2012
1
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
2
What is Cloud Computing?
The vision of Cloud Computing encompasses a general shift of
computer processing, storage, and software delivery away from the
desktop and local servers, across the network, and into next
generation of data centers hosted by large infrastructure companies.
3
Big Data!
An IDC estimate put the size of the “digital universe” at 0.18 zetta-bytes
back in 2006, and forecasted a tenfold growth by 2011 to 1.8 zetta-bytes.
This flood of data is coming from many sources. Consider the following:
•
The New York Stock Exchange generates about one terabyte of new trade
data per day.
•
Facebook hosts approximately 10 billion photos, taking up one petabyte of
storage.
•
Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.
•
The Internet Archive stores around 2 petabytes of data, and is growing at a
rate of 20 terabytes per month.
•
The Large Hadron Collider near Geneva, Switzerland, will produce about 15
petabytes of data per year.
4
Challenge?
Our existing capability to generate data
seems to outstrip our capability to analyze it.
5
Data Management in Cloud
There are some underlying issues that need to be addressed properly by any data
management scheme deployed for clouds (Abadi, 2009), including:
• capability to parallelise data workload
• security concerns as a result of storing data at an untrusted host
• and data replication functionality.
Thus the question, how to
effectively process immense
data sets is becoming
increasingly urgent.
6
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
7
Hadoop
In a nutshell, what Hadoop provides:
“A reliable shared storage and analysis system. The storage is
provided by HDFS and analysis by MapReduce”
(Hadoop, 2011)
8
9
MapReduce
MapReduce programming model requires expressing the solutions with two
functions: Map and Reduce.
• A map function takes a key/value pair, computes and emits a set of
intermediate key/value pairs as output.
• A reduce function merges all intermediate values associated with the same
intermediate key, executes some computation on them, and emits the final
output.
(Hadoop, 2011)
10
Word Count in MapReduce
Pseudo code for word count algorithm in MapReduce
1: class MAPPER
2:
method MAP (docid a, doc d)
3:
for all term t in doc d do
4:
EMIT(term t, count 1)
1: class REDUCER
2:
method REDUCE(term t, counts [c1,c2,…])
3:
sum = 0
4:
for all count c in counts [c1,c2,…] do
5:
sum = sum + c
6:
EMIT(term t, count sum)
11
Challenges and Hurdles in MapReduce
•
Map function conducts its operation assuming all related data is distributed
vertically, i.e. records being uniformly distributed across the network.
However, it is possible that some parts of the related records being stored at
different physical locations.
•
Intermediate records would need to be sorted before these are input to the
reduce function.
•
Solution must be expressed in terms of the Map and Reduce functions working
on key/value pairs, while in some cases this may not be possible or natural,
such as multi-stage processes.
•
Moreover, dependency on HDFS for data storage and retrieval can create
single-points of failure for Map/Reduce infrastructure, especially at master
nodes.
12
Contents
Existing data management schemes do not work well when
data is partitioned among numerous available nodes
Cloud Computing
1
dynamically.
Hadoop
MapReduce
2
Approaches
towards
scalable data management in cloud, which
offer greater portability, manageability and compatibility of
applications
and data,
are yet
to be fully realised.
Distributed
Pattern
Recognition
3
4
Graph Neuron (GN)
5
Hierarchical Graph Neuron (HGN)
6
Distributed Hierarchical Graph Neuron (DHGN)
7
Edge Detecting Hierarchical Graph Neuron (EdgeHGN)
8
Simulation Showcase
9
Question Time
13
Solution?
To develop a distributed data access scheme that enables data
storage and retrieval by association
Treat data records as patterns
As a result, data storage and retrieval is performed using a distributed
pattern recognition approach that is implemented through the integration
of loosely-coupled computational networks, followed by a divide-anddistribute approach that allows distribution of these networks within the
cloud dynamically.
14
Associative Model of Data
This associative model treats data records as pattern and hence it
does not matter how data is represented.
The associative model uses a single, common structure for all
types of data
15
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
16
Distributed Pattern Recognition
Distributed computing approach offers seemingly unlimited scalability
towards pattern growth with the rapid advent of network computing
technology that enables processing to be performed within the body of a
network rather than concentrating on exhaustive single-CPU utilization
Existing approaches are still lagged behind, due to highly-complex
recognition algorithms being implemented.
Neural network approach offers promising tool for large-scale pattern
recognition. However, there are also several issues related to its
implementation. These include:
•
•
•
convergence problems,
complex iterative learning procedures,
and low scalability with regards to the training data required for optimum
recognition
17
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
18
An eight node GN is in the process of storing patterns (Khan, 2002).
P1 (RED), P2 (BLUE), P3 (BLACK), and P4 (GREEN)
19
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
20
Hierarchical Graph Neuron (HGN)
HGN compositions of 2-dimension (7x5) and 3-dimension (7x5x3) for pattern sizes
21
Distributed Hierarchical Graph Neuron (DHGN)
DHGN distributed pattern recognition architecture
(Muhammad Amin and Khan, 2009).
22
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
23
Research Objectives
•
Redesigning data management architecture from a scalable associative computing
perspective for creating a database-like functionality that can scale up or down over the
available infrastructure without interruption or degradation, dynamically.
•
Investigating a distributed data access scheme that enables data storage and retrieval by
association while data records are treated as patterns
•
Processing the database and handling the dynamic load using a distributed pattern
recognition approach
•
Developing an intelligent MapReduce framework that allows complex data representations to
be used as keys for Map operations
•
Reducing cloud storage fragmentation by implementing a divide-and-distribute approach
•
Enhancing the existing cloud data management models for scalability
•
Validation of results and finding asymptotical limits of the technique through a rigorously
designed computer simulation environment
24
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
25
Progress to Date
•
Proposing a Web-based GN for Real-time Image Recognition
26
Web-based GN
(a) Total number of positive and negative matches.
(b) Distortion rates for each line of image (each constructed HGN).
Image distortion rates vs. rotation degrees.
27
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
28
Edge Detecting Hierarchical Graph Neuron (EdgeHGN)
7-by-7 bit Binary Character A and its 7 equally-sized DHGN subnets
Reducing number of neurons by applying a drop-fall technique
29
Drop Fall Scheme
•
Drop-fall is often used for dividing touching pairs of digits into isolated character.
Drop-fall algorithm simulates the path produced by a drop of water falling from
above the character and sliding downwards along the contour under the action of
gravity.
•
When the drop gets stuck in a groove, it melts the character‘s stroke and then
continues to fall. The dividing path produced by Drop-fall algorithm depends on
three aspects: a start point, movement rules, and direction.
•
There are four possible directions that generally produce four different paths to
divide touching digits. They can start on the left or right side and can evolve
downwards or upwards. One of the four is likely to produce the right result.
•
Therefore, a set of Drop-fall algorithms consists of four methods which try to
segment a block by simulating a drop-falling process: Descending-left algorithm,
Descending-right algorithm, Ascending-left algorithm, and Ascending-right algorithm
30
EdgeHGN Performance
31
Contents
1
Cloud Computing
2
Hadoop MapReduce
3
Pattern Recognition and Distributed Approach
4
Graph Neuron for Scalable Pattern Recognition
5
HGN and DHGN
6
Research Objective
7
Web-based GN
8
EdgeHGN
9
Simulation Showcase
32
Disclaimer
I am not proposing any computer vision scheme for Image processing here.
I am not suggesting in any way that my scheme is capable of competing against a
bunch of image processing and face recognition algorithms which are treated in
the literature.
I am doing pattern matching and I could simply use any form of data representation
for the purpose of my research.
Images are complex matrixes of values, but people can relate to images very well,
and that is why I found it an easy way to illustrate the effectiveness and strength of
my proposed model.
33
Binary Image Recognition
Fifty different individuals in the face image dataset obtained from the Face Recognition Data.
34
Sobel Operator
In simple terms, the Sobel operator calculates the gradient of the image intensity at
each point, giving the direction of the largest possible increase from light to dark and
the rate of change in that direction.
The result therefore shows how "abruptly" or "smoothly" the image changes at that
point, and therefore how likely it is that that part of the image represents an edge, as
well as how that edge is likely to be oriented.
Edge map after applying Global Binary Signature and Sobel‘s edge detection
35
References
Abadi, D.J. (2009). Data Management in the Cloud: Limitations and Opportunities, Bulletin
of the Technical Committee on Data Engineering, pp. 3 - 12.
Khan, A. I. and Muhamad Amin, A. (2007). One shot associative memory method for
distorted pattern recognition, Al 2007: Advances in Artificial Intelligence, Springer,
Berlin/Heidelberg, pp. 705—709.
Muhamad Amin, A. and Khan, A. I. (2009). Collaborative-comparison learning for complex
event detection using distributed hierarchical graph neuron (DHGN) approach in wireless
sensor network, Al 2009: Advances in Artificial Intelligence, Springer, Berlin/Heidelberg, pp.
111—120
Nasution, B. B. and Khan, A. I. (2008). A hierarchical graph neuron scheme for real-time
pattern recognition, IEEE Transactions on Neural Networks 19(2): 212—229.
Shiers, J. (2009). Grid today,
Communications, pp. 559 - 563.
clouds
on
the
horizon,
Computer
Physics
Welsh, M., Malan, D., Duncan, B., Fulford-Jones, T. and Moulton, S. (2004). Wireless sensor
networks for emergency medical care, GE global conference, Harvard university and Boston
University school of medicine, Boston, MA.
36
Acknowledgement
I would like here to thank everyone who helped me to make this
possible. The first and foremost person that deserves immense
gratitude is my thesis supervisor, Dr Asad Khan for his support and
kind contributions.
Thank You.
37
38
Download