MapReduce on Matlab
Erum Afzal
MapReduce is a programming model
devised at Google to facilitate the
processing of large data sets.
For example, it is used at Google for
indexing websites
• Matlab, being software tenders with a
technical computing environment.
• It is being used for numerical
manipulation, simulations and data
MapReduce on Matlab
• MapReduce on Matlab allows Matlab users to
apply MapReduce’s framework to their own data
processing requirements. Like all data mining
tasks, dense detailed digital images. Similarly if
we could import matlab file to Map Reduce
framework several functionalities of Matlab can
processed on Hadoop as well as.
Working of MapReduce
• As with the application of MapReduce, data can
be processed using multiple processors in
parallel. With this it can
• Handle large volumes of input data.
• Speed up processing due to parallelization of
Each piece of input data,
identified by a key and a value,
is mapped to 1 or more
intermediate key/value
Each worker processes a part of
the intermediate key/values
pairs, to generate the final
key/value pairs.
Working of Matlab
The Matlab Parallel Computing Toolbox offers the
framework to write programs for a cluster of
computers. This enables a master computer to
dispatch jobs to workers running on McGill’s
Master creates
MapReduce job,
passes user defined
Map and Reduce
functions to workers
At each worker, the input
key pairs are fed
into the map function to
get intermediate
key/value pairs
At each worker, the
intermediate key/value pairs
are fed into the reduce
function to get final key/value
pairs the output
Orthogonal Matching Pursuit
Here in example
A sparse signal is
that x, can be stored
by multiplying it with
a measurement
matrix, A:
• Where, y = Ax
• y can be used to
recover x by
• using OMP,
Application with Mapreduce
OMP becomes slow in
its tradition solution as
A grows larger in size. If
we resolve the problem
by processing individual
performed using
• OMP becomes slow
as A grows larger in
size. This problem
can be solved by
processing individual
slices of A in parallel.
• The MapReduce
method actually.
• MapReduce was implemented on Matlab, and
was used to run Orthogonal Matching Pursuit..
• MapReduce on Matlab has the potential to
improve the performance of numerous parallel
processing algorithms by bringing the power
ofthe MapReduce programming model to Matlab
Singular Value Decomposition (SVD)
The Singular Value Decomposition (SVD) is a
powerful matrix decomposition frequently used
for dimensionality reduction. SVD is widely used
in problems involving least squares problems,
linear systems and finding a low rank
representation of a matrix. A wide range of
applications uses SVD as its main algorithmic
Finding patterns in large scale graphs, with millions and billions of
edges is increasing in computer network security intrusion
detection, spamming, in web applications.
Such a setting is the estimation of the clustering coefficients and
the transitivity ratio of the graph, which effectively translates in
computing the number of triangles that each node participates in
or the total number of triangles in the graph respectively.
The triangles are a frequently used network statistic in the
exponential random graph model and naturally appear in models
of real-world network evolution, the triangles have been used in
several applications such as spam detection ,uncovering the
hidden thematic structure of the web and for link recommendation
in online social networks .
It is worth noting that in social networks triangles have a natural
interpretation. AS
“friends of friends are frequently friends themselves.”
MATLAB implementation, k-rank approx
function 0 = EigenTriangleLocal(A,k) {A is the adjacency matrix, k is
required rank approximation}
n = size(A,1);
0 = zeros(n,1); {Preallocate space for 0}
opts.isreal=1; opts.issym=1; {Specify that the matrix is real and
[u l] = eigs(A,k,’LM’,opts); {Compute top k eigenvalues and
eigenvectors of
l = diag(l)’;
for j=1:n do
0(j) = sum( l.ˆ3.*u(j,:).ˆ2)/2
end for
Summary of network data
• In this work the EIGENTRIANGLE and
EIGENTRIANGLELOCAL algorithms have been
proposed to estimate the total number of
triangles and the number of triangles per node
respectively in an undirected, outweighed graph.
The special spectral properties which real-world
networks frequently possess make both
algorithms efficient for the triangle counting
problem. our knowledge, the knowledge
Fast Randomized Tensor Decompositions
• There are many real-world problems involve multiple
aspect data. For example fMRI (functional magnetic
resonance imaging) scans, one of the most popular
neuroimaging techniques, result in multi-aspect data:
voxels × subjects × trials ×task conditions × timeticks.
Monitoring systems result in three-way data, machine id ×
type of measurement × timeticks. The machine depending
on the setting can be for instance a sensor (sensor
networks) or a computer (computer networks). Large data
volumes generated by personalized web search, are
frequently modeled as three way tensors, i.e., users ×
queries × web pages.
• All above is quite time taking task….
• Ignoring the multi-aspect nature of the data by flattening
them in a two-way matrix and applying an exploratory
analysis algorithm, e.g., singular value decomposition
(SVD) is not optimal and typically hurts significantly the
• The same problem holds in the case of applying e.g., SVD
on different 2-way slices of the tensor as observed by [94].
On the contrary, multiway data analysis techniques
succeed in capturing the multilinear structures in the data,
thus achieving better performance than the
aforementioned ideas.
Problem Solution
Tensor decompositions have found as solution in
many applications in different scientific
disciplines. Specially in computer vision and
signal processing like neuroscience, time series
anomaly detection, psychometrics, graph
analysis and data mining.
Algorithm 8 MACH-HOSVD
• Tensor decompositions are useful in many real
world problems. A simple randomized algorithm
MACH is purposed which is easily parallelizable
and adapted to online streaming systems.
• This algorithm will be incorporated in the
PEGASUS library, a graph and tensor mining
system for handling large amounts of data.
More Applications
• Comparing the Performance of Clusters,
Hadoop, and Active Disks on Microarray
Correlation Computations.
• Beyond Online Aggregation: Parallel and
Incremental Data Mining with Online MapReduce (DRAFT).
• Map-Reduce for Machine Learning on Multicore.
• Charalampos E. Tsourakaki “Data Mining with
MAPREDUCE:Graph and Tensor Algorithmswith
Applications”, March 2010.
• Arjita Madan, “ MapReduce on Matlab”