Why Not Store Everything in Main Memory? Why use disks?

advertisement
6. Files of (horizontal) Records
The concepts of pages or blocks suffices when doing I/O, but the higher layers of a DBMS
operate on records and files of records
•
FILE: A collection of pages, each containing a collection of records, which supports:
– insert, delete, and modify (on a record)
– read a particular record (specified using Record ID or RID)
– scan of all records (possibly with some conditions on the records to be retrieved)
Section 6
#1
Files Types
• The three basic file organizations supported by the File Manager of most DBMSs
are:
• HEAP FILES (files of un-ordered records)
• SORTED or CLUSTERED FILES ( records sorted or clustered on some field(s) )
•
• HASHED FILES (files in which records are positioned based on a hash function on
some field(s)
Unordered (Heap) Files
• Simplest file structure contains records in no particular order.
• As file grows and shrinks, disk pages are allocated and deallocated.
• To support record level operations, DBMS must:
–
–
–
keep track of the pages in a file
keep track of free space on pages
keep track of the records on a page
• There are many alternatives for keeping track of these.
Heap File Implemented as a Linked List
Data
Page
Data
Page
…
Data
Page
Full Pages
Header
Page
Data
Page
Data
Page
…
Data
Page
• The header page id and Heap file name must be stored someplace.
• Each page contains 2 `pointers’ plus data.
Pages with
Free Space
Heap File Using a Page Directory
Data
Page 1
Header
Page
Data
Page 2
DIRECTORY
(linked list of
Header blocks
Containing page
IDs)
Data
Page N
• The entry for a page can include the number of free bytes on the page.
• The directory is a collection of pages; linked list implementation is just one
alternative.
Heap File Facts
Record insert? Method-1: System inserts new records at the End Of File (need Next Open
Slot indicator), moves last record into freed slot following a deletion, updates indicator.
- doesn't allow support of the RID or RRN concept. Or a deleted record slot can remain empty
(until file reorganized)
- If record are only moved into freed slots upon reorganization, then RIDs and RRNs can be
supported.
<- page
| record |0
| record |1
| record |2
|
|3
|
|4
|
|5
|3 | <- next open slot indicator
Heap File Facts
Record insert Method-2: Insert in any open slot. Must maintain a data structure
indicating open slots (e.g., bit filter (or bit map) identifies open slots) - as a list or as a bit_filter
<- page
record
record
record
101001  availability bit filter (0 means available)
If we want all records with a given value in particular field, need an "index"
Of course index files must provide a fast way to find the particular value entries of
interest (the heap file organization for index files would makes little sense).
Index files are usually sorted files. Indexes are examples of ACCESS PATHS.
Sorted File (Clustered File) Facts
File is sorted on one attribute (e.g., using the unpacked record-pointer page-format)
Advantages over heap includes:
- reading records in that particular order is efficient
- finding next record in order is efficient. For efficient "value-based" ordering
(clustering), a level of indirection is useful (unpacked, record-pointer page-format)
What happens when a page fills up?
page 3
RID(3,3) 0
RID(3,0) 1
RID(3,4) 2
RID(3,2) 3
RID(3,1) 4
RID(3,8) 5
520341
unpacked
record-pointer
page-format
slot-directory
Use an overflow page for
next record?
Ovfl page 9
RID(3,6) 0 When a page fills up and,e.g., a record must be
inserted and clustered between (3,1) and
1
(3,5), one solution is to simply place it on an
overflow page in arrival order.
2
3 Then the overflow page is scanned like an
unordered file page, when necessary.
4 Periodically the primary and overflow pages can
5
be reorganized as an unpacked record-pointer
extent to improve sequential access speed
|0
(next slide for an example)
Sorted File (Clustered File) Facts
Reorganizing a Sorted File with several overflow levels. THE BEFORE:
page 3
RID(3,3) 0
RID(3,0) 1
RID(3,4) 2
RID(3,2) 3
RID(3,1) 4
RID(3,8) 5
520341
Ovfl page 9
RID(3,6) 0
RID(3,9) 1
RID(3,5) 2
RID(3,11)3
RID(3,10)4
RID(3,15) 5
534102
Ovfl page 2
RID(3,7) 0
1
2
3
4
5
0
AFTER:
page 3
RID(3,3) 0
RID(3,0) 1
RID(3,4) 2
RID(3,2) 3
RID(3,1) 4
RID(3,5) 5
520341
Ovfl page 9
RID(3,6) 0
RID(3,9) 1
RID(3,8) 2
RID(3,11)3
RID(3,10)4
RID(3,7) 5
341250
Here, re-organization
requires only
Ovfl page 2
RID(3,15) 0
1
2
3
4
5
0
2 record swaps and
1 slot directory
re-write.
Hash files
A hash function is applied to the key of a record to determine which "file
bucket" it goes to ("file buckets" are usually the pages of that file. Assume there are M pages, numbered 0
through M-1. Then the hash function can be any function that converts the key to a number between 0 and
M-1 (e.g., for numeric keys, MODM is typically used. For non-numeric keys, first map the non-numeric key
value to a number and then apply MODM...) ). Collisions or Overflows can occur (when a new record hashes
to a bucket that is already full). The simplest Overflow method is to use separate Overflow pages:
Overflow pages are allocated if needed (as a separate link list for each bucket.
Page#s are needed for pointers)
or a shared link list.
Long overflow chains can develop and degrade performance.
– Extendible and Linear Hashing are dynamic techniques to fix this problem.
e.g., h(key) mod M
key
0
1
2
h
...
M-1
Primary bucket pages Overflow
pages(as
Overflow
pages(as
Single
link link
list)list)
separate
Other Static Hashing overflow handling methods
Overflow can be handled by open addressing also
(more commonly used for internal hash tables where a bucket is a allocation of main memory,
not a page.
In Open Addressing, upon collision, search forward in the
bucket sequence for the next open record slot.
e.g., h(key) mod M
rec
key
h
h(rec_key)=1
Collision!
2? no
3? yes
0
1
2
3
4
5
6
rec
rec
rec
rec
bucket pages
Then to search, apply h.
If not found, search sequentially
ahead until found (circle around to
search start point)!
Other overflow handling methods
Overflow can be handled by re-hashing also.
In re-hashing, upon collision, apply next hash function from a
sequence of hash functions..
h0(key)
then h1
then h2
...
h
0
1
2
3
4
5
6
rec
rec
rec
rec
bucket pages
Then to search, apply h.
If not found, apply next hash function
until found or list exhausted.
These methods can be combined also.
Extendible Hashing
Idea: Use directory of pointers to buckets,
•
•
split just the bucket that overflowed
double the directory when needed
Directory is much smaller than file, so doubling it is cheap.
Only one page of data entries is split. No overflow page!
Trick lies in how hash function is adjusted!
Example
blocking factor(bfr)=4
(# entries per bucket)
LOCAL DEPTH
GLOBAL DEPTH = gd
To find the bucket for a new key value, r,
take just the last global depth bits of h(r),
not all of it! (last 2 bits in this example)
2
00
2
4* 12* 32* 16*
Bucket A
2
1*
5* 21* 13*
Bucket B
01
(for simplicity we let h(r)=r here)
E.g., h(5)=5=101binary thus it's in
bucket pointed in the directory by
01.
10
2
11
10*
DIRECTORY
2
15* 7* 19*
Local depth of a bucket: # of bits used to
determine if an entry belongs to bucket
Global depth of directory: Max # of bits
needed to tell which bucket an entry
belongs to (= max of local depths)
Bucket C
Bucket D
DATA PAGES
Apply hash function, h, to key value, r
Follow pointer of last 2 bits of h(r).
Insert: If bucket is full, split it (allocate 1 new page, re-distribute over those 2 pages).
Example
how did we get there?
LOCAL DEPTH
1
GLOBAL DEPTH = gd
4*
1
0
Bucket A
Bucket B
0
1
First insert is 4:
h(4) = 4 = 100binary in bucket pointed to by 0 in the directory.
DIRECTORY
DATA PAGES
Example
LOCAL DEPTH
GLOBAL DEPTH = gd
1
0
Insert: 12, 32, 16 and 1
h(12) = 12 = 1100binary in bucket
pointed in the directory by 0.
1
4* 12* 32* 16*
1
1*
1
h(32) = 32 = 10 0000binary in bucket
pointed in the directory by 0.
h(16) = 16 = 1 0000binary in bucket
pointed in the directory by 0.
h(1) = 1 = 1binary in bucket pointed in
the directory by 1.
Bucket A
DIRECTORY
DATA PAGES
Bucket B
Example
LOCAL DEPTH
GLOBAL DEPTH = gd
1
0
Insert: 5, 21 and 13
h(5) = 5 = 101binary in bucket pointed in
the directory by 1.
h(21) = 21 = 1 0101binary in bucket
pointed in the directory by 1.
h(13) = 13 = 1101binary in bucket
pointed in the directory by 1.
1
4* 12* 32* 16*
Bucket A
1
1* 5* 21* 13*
1
DIRECTORY
DATA PAGES
Bucket B
Example
9th insert: 10
h(10) = 10 = 1010binary in bucket
pointed in the directory by 0.
Collision!
LOCAL DEPTH
GLOBAL DEPTH = gd
2
00
2
4* 12* 32* 16*
1
1* 5* 21* 13*
10
2
11
10*
and adding a bit on the left).
DIRECTORY
Reset one pointer.
Redistribute values among A and C (if necessary Not necessary
this time since all green bits (2's position bits) are correct:
4 =
100
12 = 1100
32 =100000
16 = 10000
1010
Bucket B
01
Split bucket A into A and C.
Double directory (by copying what is there
10 =
Bucket A
DATA PAGES
Bucket C
Example
LOCAL DEPTH
GLOBAL DEPTH = gd
2
4* 12* 32* 16*
Bucket A
Inserts: 15, 7 and 19
2
h(15) = 15 = 1111binary
h(15) = 7 = 111binary
h(19) = 15 = 1 0011binary
00
12
1* 5* 21* 13*
01
10
2
11
10*
DIRECTORY
Reset one pointer, and redistribute
values among B and D (if
necessary, not necessary this time).
Reset local depth of B and D
Bucket C
12
15*
Split bucket B into B and D.
No need to double directory
because the local depth of B is
less than the global depth.
Bucket B
7* 19*
DATA PAGES
Bucket D
Insert h(20)=20=10100  Bucket pointed to by 00 is full!
Split A.
LOCAL DEPTH
GLOBAL DEPTH
23
0 00
Double directory and reset 1 pointer.
23
4* 12* 32* 16*
Bucket A
2
1* 5* 21*13* Bucket B
0 01
010
2
011
10*
Bucket C
1 00
1 01
1 10
2
15* 7* 19*
Bucket D
1 11
3
4* 12*
Bucket E
(`split image'
of Bucket A)
Redistribute contents of A
Points to Note
• 20 = binary 10100. Last 2 bits (00) tell us r belongs in
either A or A2, but not which one. Last 3 bits needed to tell
which one.
–
Local depth of a bucket: # of bits used to determine if an entry
belongs to this bucket.
–
Global depth of directory: Max # of bits needed to tell which
bucket an entry belongs to (= max of local depths)
• When does bucket split cause directory doubling?
–
–
Before insert, local depth of bucket = global depth. Insert causes
local depth to become > global depth; directory is doubled by
copying it over and `fixing’ pointer to split image page.
Use of least significant bits enables efficient doubling via copying
of directory!)
Comments on Extendible Hashing
• If directory fits in memory, equality search answered with
one disk access; else two.
–
Directory grows in spurts, and, if the distribution of hash values is
skewed, directory can grow large.
–
Multiple entries with same hash value cause problems!
• Delete: If removal of data entry makes bucket empty, can
be merged with its `split image’.
– As soon as each directory element points to same bucket as its
(merged) split image, can halve directory.
Linear Hash File
Starts with M buckets (numbered 0, 1, ..., M-1 and initial hash function, h0=modM (or
more general, h0(key)=h(key)modM for any hash ftn h which maps into the integers
Use Chaining to shared overflow-pages to handle overflows.
At the first overflow, split bucket0 into bucket0 and bucketM and rehash bucket0 records
using h1=mod2M. Henceforth if h0 yields valuen=0, rehash using h1=mod2M
At the next overflow, split bucket1 into bucket1 and bucketM+1 and rehash bucket1
records using h1=mod2M. Henceforth if h0 yields valuen=1, use h1
...
When all of the original M buckets have been split (M collisions), then rehash all
overflow records using h1. Relabel h1 as h0, (discarding the old h0 forever) and start a
new "round" by repeating the process above for all future collisions (i.e., now there
are buckets 0,...,(2M-1) and h0 = MOD2M).
To search for a record, let n = number of splits so far in the given round, if h0(key) is not
greater than n, then use h1, else use h0.
|
15|LOWE
|
|
|ZAP
|
|
|ND
|
21
02|BAID
22|ZHU
25|CLAY
|
|
|NY
|NY
|SF
|CA
|OUTBK|NJ
|
|
|
|
23
33|GOOD
|
|GATER|FL
|
|
8|SINGH
|
|FGO
|
|ND
|
78
98
11|BROWN |NY
21|BARBIE|NY
99
|
|
|
36|SCHOTZ|CORN
|IA
|
|
|
Linear Hash ex. M=5
45
14|THAISZ|KNOB |NJ
24|CROWE |SJ
|CA
|NY
|NY
Bucket pg
0
45
1
99
2
23
3
78
4
98
5
21
6
101
7
104
8
105
Insert 27|JONES |MHD
|MN
h0(27)mod5(27)=2 C! 00,5, mod10rehash 0;
n=0 8|SINGH |FGO |ND
Insert
h0(8)mod5(8)=3
Insert 15|LOWE
101
|ZAP
|ND
h0(15)mod5(15)=0n h1(15)mod10(15)=5
Insert 32|FARNS |BEEP |NY
|
|
|
|
|
|
104
h0(32)mod5(32)=2! 11,6, mod10rehash 1; n=1
|
|
|
|
|
|
105
h0(39)mod5(39)=4! 22,7; mod10rehash 2; n=2
Insert 39|TULIP |DERLK|IN
Insert 31|ROSE
|MIAME|OH
h0(31)mod5(31)=1<n mod10(31)=1! 33,8; mod10rehash 3 n=3
Insert 36|SCHOTZ|CORN |IA
h0(36)mod5(36)=1<n mod10(36)=6
OF
27|JONES |MHD |MN
32|FARNS |BEEP |NY
39|TULIP |DERLK|IN
31|ROSE |MIAME|OH
25|CLAY
|
15|LOWE
|
|
|OUTBK|NJ
|
|ZAP
|
|ND
|
21
02|BAID
22|ZHU
10|RADHA
|
|
|NY
|SF
|FGO
|
|
23
33|GOOD
|
|
|NY
|CA
|ND
|
|
45
|GATER|FL
|
|
|
|
78
14|THAISZ|KNOB |NJ
24|CROWE |SJ
|CA
98
11|BROWN
|
|NY
|
21|BARBIE|NY
99
|
|NY
|NY
|
|
|
36|SCHOTZ|CORN
|IA
|
|
|
|
|
|
|
8|SINGH
|FGO
|
|
|
|
|
|
|
|
101
|
|
104
|ND
|
|
105
|
|
109
Bucket pg
0
45
1
99
2
23
3
78
4
98
5
21
6
101
7
104
8
105
LHex. 2nd rnd M=10
h0mod10
h027=7
h032=2
Collision! rehash mod20
h039=9
h031=1
9
10
Collision! rehash mod20
Insert 10|RADHA |FGO
109
110
|ND
h0(10)mod10(10)=0
ETC.
|
|
|
|
|
|
110
OVERFLOW
27|JONES |MHD |MN
32|FARNS |BEEP |NY
39|TULIP |DERLK|IN
31|ROSE |MIAME|OH
Summary
• Hash-based indexes: best for equality searches, cannot support range
searches.
• Static Hashing can lead to performance degradation due to collision
handling problems.
• Extendible Hashing avoids performance problems by splitting a full
bucket when a new data entry is to be added to it. (Duplicates may
require overflow pages.)
–
Directory to keep track of buckets, doubles periodically.
–
Can get large with skewed data; additional I/O if this does not fit in main
memory.
Summary
• Linear Hashing avoids directory by splitting buckets round-robin, and using
overflow pages.
–
Overflow pages not likely to be long.
–
Duplicates handled easily.
–
Space utilization could be lower than Extendible Hashing, since splits not
concentrated on `dense’ data areas.
• skewed occurs when the hash values of the data entries are not uniform!
...
v1 v2 v3 v4 v5 . . . vn
values
Distribution skew
v 1 v2 v3 v4 v5 . . . vn
values
Count skew
...
v1 v2 v3 v4 v5 . . . v n
values
Dist & Count skew
Map Reduce (from Wikipedia)
MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes).
Computational processing can occur on data stored either in a file system (unstructured) or in a database (structured).
•
"Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node
may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem and passes the answer back.
•
"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the
answer to the problem it was originally trying to solve.
MapReduce allows for distributed processing of the map and reduction operations. Provided each mapping operation is independent of the others,
all maps can be performed in parallel. Similarly, a set of 'reducers' can perform the reduction phase - provided all outputs of the map operation
that share the same key are presented to the same reducer at the same time, or if the reduction function is associative.
•
Another way to look at MapReduce is as a 5-step parallel and distributed computation:
•
Prepare the Map() input – the "MapReduce system" designates Map processors, assigns the K1 input key value each processor would
work on, and provides that processor with all the input data associated with that key value.
•
Run the user-provided Map() code – Map() is run exactly once for each K1 key value, generating output organized by key values K2.
•
"Shuffle" the Map output to the Reduce processors – the MapReduce system designates Reduce processors, assigns the K2 key value
each processor would work on, and provides that processor with all the Map-generated data associated with that key value.
•
Run the user-provided Reduce() code – Reduce() is run exactly once for each K2 key value produced by the Map step.
•
Produce the final output – the MapReduce system collects all the Reduce output, and sorts it by K2 to produce the final outcome.
•
Logically these 5 steps can be thought of as running in sequence – each step starts only after the previous step is completed – though in
practice, of course, they can be intertwined, as long as the final result is not affected.
•
In many situations the input data might already be distributed among many different servers, in which case step 1 could sometimes be
greatly simplified by assigning Map servers that would process the locally present input data. Similarly, step 3 could sometimes be sped up
by assigning Reduce processors that are as much as possible local to the Map-generated data they need to process.
•
The Map and Reduce functions of MapReduce are both defined with respect to data structured in (key, value) pairs. Map takes one pair of
data with a type in one data domain, and returns a list of pairs in a different domain: Map(k1,v1) → list(k2,v2)
•
The Map function is applied in parallel to every pair in the input dataset. This produces a list of pairs for each call. After that, the
MapReduce framework collects all pairs with the same key from all lists and groups them together, creating one group for each key.
•
The Reduce function is applied in parallel to each group, producing a collection of values in the same domain: Reduce(k2,list(v2))→list(v3).
•
Each Reduce call typically produces either one value v3 or an empty return, though one call is allowed to return more than one value. The
returns of all calls are collected as the desired result list.
•
Thus the MapReduce framework transforms a list of (key, value) pairs into a list of values.
The prototypical MapReduce example counts the appearance of each word in a set of documents
function map(String name, String document):
// name: document name; document: document contents
for each word w in document:
emit (w, 1)
function reduce(String word, Iterator partialCounts):
// word: a word; partialCounts: a list of aggregated partial counts
sum = 0
for each pc in partialCounts:
sum += ParseInt(pc)
emit (word, sum)
Here, each document is split into words, and each word is counted by the map function, using the word as the result key. The framework puts
together all the pairs with the same key and feeds them to the same call to reduce, thus this function just needs to sum all of its input values to
find the total appearances of that word. As another example, imagine that for a database of 1.1 billion people, one would like to compute the
average number of social contacts a person has according to age. In SQL such a query could be expressed as: SELECT age AS Y, AVG(contacts)
AS A FROM social.person GROUP BY age ORDER BY age. Using MapReduce, the K1 key values could be the integers 1 thu 1,100, each
representing a batch of 1M records,K2 key value could be a person’s age in yrs, and this comp could be achieved using the following functions:
function Map is
input: integer K1 between 1 and 1100, representing a batch of 1 million social.person records
for each social.person record in the K1 batch do
let Y be the person's age
let N be the number of contacts the person has
produce one output record <Y,N>
repeat
end function
function Reduce is
input: age (in years) Y
for each input record <Y,N> do Accumulate in S the sum of N; Accumulate in C the count of records so far
repeat
let A be S/C
produce one output record <Y,A>
end function
Map Reduce-2
The MapReduce System would line up the 1,100 Map processors, and would provide each with its corresponding 1 million input records. The
Map step would produce 1.1 billion <Y,N> records, with Y values ranging between, say, 8 and 103. The MapReduce System would then line up
the 96 Reduce processors by performing shuffling operation of the key/value pairs due to the fact that we need average per age, and provide each
with its millions of corresponding input records. The Reduce step would result in the much reduced set of only 96 output records <Y,A>, which
would be put in the final result file, sorted by Y.
Dataflow
The frozen part of the MapReduce framework is a large distributed sort. The hot spots, which the application defines, are: an input reader, a Map
function, a partition function, a compare function, a Reduce function, an output writer
Input reader: The input reader divides the input into appropriate size 'splits' (in practice typically 16 MB to 128 MB) and the framework assigns
one split to each Map function. The input reader reads data from stable storage (typically a distributed file system) and generates key/value pairs.
A common example will read a directory full of text files and return each line as a record.
Map function: The Map function takes a series of key/value pairs, processes each, and generates zero or more output key/value pairs. The input
and output types of the map can be (and often are) different from each other.
If the application is doing a word count, the map function would break the line into words and output a key/value pair for each word. Each output
pair would contain the word as the key and the number of instances of that word in the line as the value.
Partition function: Each Map function output is allocated to a particular reducer by the application's partition function for sharding purposes.
The partition function is given the key and the number of reducers and returns the index of the desired reduce.
A typical default is to hash the key and use the hash value modulo the number of reducers. It is important to pick a partition function that gives an
approximately uniform distribution of data per shard for load-balancing purposes, otherwise the MapReduce operation can be held up waiting for
slow reducers (reducers assigned more than their share of data) to finish.
Between the map and reduce stages, the data is shuffled (parallel-sorted / exchanged between nodes) in order to move the data from the map node
that produced it to the shard in which it will be reduced. The shuffle can sometimes take longer than the computation time depending on network
bandwidth, CPU speeds, data produced and time taken by map and reduce computations.
Comparison function: The input for each Reduce is pulled from the machine where Map ran and sorted using the app's comparison function.
Reduce function:The framework calls the application's Reduce function once for each unique key in the sorted order. The Reduce can iterate
through the values that are associated with that key and produce zero or more outputs.
In the word count example, the Reduce function takes the input values, sums them and generates a single output of the word and the final sum.
Output writer: The Output Writer writes the output of the Reduce to the stable storage, usually a distributed file system.
Distribution and reliability: MapReduce achieves reliability by parceling out a number of operations on the set of data to each node in the
network. Each node is expected to report back periodically with completed work and status updates. If a node falls silent for longer than that
interval, the master node (similar to the master server in the Google File System) records the node as dead and sends out the node's assigned work
to other nodes. Individual operations use atomic operations for naming file outputs as a check to ensure that there are not parallel conflicting
threads running. When files are renamed, it’s possible to also copy them to another name in addition to the task name (allowing for side-effects).
Map Reduce-3
The reduce operations operate much the same way. Because of their inferior properties with regard to parallel operations, the master node
attempts to schedule reduce operations on the same node, or in the same rack as the node holding the data being operated on. This property is
desirable as it conserves bandwidth across the backbone network of the datacenter.
Implementations are not necessarily highly reliable. For example, in older versions of Hadoop the
NameNode was a single point of failure for the distributed filesystem. Later versions of Hadoop have high availability with an active/passive
failover for the "NameNode."
Map Reduce-4
Uses: MapReduce is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph
reversal, term-vector per host, web access log stats, inverted index construction, document clustering, machine learning,[5] and statistical machine
translation. Moreover, the MapReduce model has been adapted to several computing environments like multi-core and many-core systems,[6][7]
desktop grids,[8] volunteer computing environments,[9] dynamic cloud environments,[10] and mobile environments.[11]
At Google, MapReduce was used to completely regenerate Google's index of the World Wide Web. It replaced the old ad hoc programs that
updated the index and ran the various analyses.[12] MapReduce's stable inputs and outputs are usually stored in a distributed file system. The
transient data is usually stored on local disk and fetched remotely by the reducers.
Criticism: David DeWitt and Michael Stonebraker, computer scientists specializing in parallel databases and shared-nothing architectures, have
been critical of the breadth of problems that MapReduce can be used for.[13] They called its interface too low-level and questioned whether it
really represents the paradigm shift its proponents have claimed it is.[14] They challenged the MapReduce proponents' claims of novelty, citing
Teradata as an example of prior art that has existed for over two decades. They also compared MapReduce programmers to Codasyl
programmers, noting both are "writing in a low-level language performing low-level record manipulation."[14] MapReduce's use of input files and
lack of schema support prevents the performance improvements enabled by common database system features such as B-trees and hash
partitioning, tho projects such as Pig (or PigLatin), Sawzall, Apache Hive,[15] YSmart,[16] HBase[17] and BigTable[17][18] are addressing these probs.
Greg Jorgensen wrote an article rejecting these views.[19] Jorgensen asserts that DeWitt and Stonebraker's entire analysis is groundless as
MapReduce was never designed nor intended to be used as a database. DeWitt and Stonebraker have published a detailed benchmark study in
2009 comparing performance of Hadoop's MapReduce and RDBMS approaches on several specific problems.[20] They concluded that relational
databases offer real advantages for many kinds of data use, especially on complex processing or where the data is used across an enterprise, but
that MapReduce may be easier for users to adopt for simple or one-time processing tasks. They have published the data and code used in their
study to allow other researchers to do comparable studies. Google has been granted a patent on MapReduce.[21] However, there have been claims
that this patent should not have been granted because MapReduce is too similar to existing products. For example, map and reduce functionality
can be very easily implemented in Oracle's PL/SQL database oriented language.[22]
Conferences and users groups
The First International Workshop on MapReduce and its Applications (MAPREDUCE'10) was held with the HPDC conference and OGF'29
meeting in Chicago, IL. MapReduce Users Groups around the world.
Map Reduce-5
See also
Hadoop, Apache's free and open source implementation of MapReduce.
Pentaho - Open source data integration (Kettle), analytics, reporting, visualization and predictive analytics directly from Hadoop nodes
Nutch - An effort to build an open source search engine based on Lucene and Hadoop, also created by Doug Cutting
Datameer Analytics Solution (DAS) - data source integration, storage, analytics engine and visualization
Apache Accumulo - Secure Big Table HBase - BigTable-model database
Hypertable - HBase alternative
Apache Cassandra - column-oriented DB supports access from Hadoop HPCC - LexisNexis Risk Solutions High Perf Computing Cluster
Sector/Sphere - Open source distributed storage and processing Cloud computing Big data Data Intensive Computing
Algorithmic skeleton - A high-level parallel programming model for parallel and distributed computing
MongoDB - A scalable, high-performance, open source NoSQL database
MapReduce-MPI MapReduce-MPI Library
Specific references:
^ Google spotlights data center inner workings | Tech news blog - CNET News.com
^ "Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -"MapReduce: Simplified
Data Processing on Large Clusters", by Jeffrey Dean and Sanjay Ghemawat; from Google Research
^ "Google's MapReduce Programming Model -- Revisited" — paper by Ralf Lämmel; from Microsoft
^ http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0004.html
^ Cheng-Tao Chu; Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Ng, and Kunle Olukotun. "Map-Reduce for Machine
Learning on Multicore". NIPS 2006.
^ Colby Ranger; Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. "Evaluating MapReduce for Multi-core and
Multiprocessor Systems". HPCA 2007, Best Paper.
^ Bingsheng He, et al.. "Mars: a MapReduce framework on graphics processors". PACT'08.
^ Bing Tang, Moca, M., Chevalier, S., Haiwu He and Fedak, G. "Towards MapReduce for Desktop Grid Computing". 3PGCIC'10.
^ Heshan Lin, et al. "MOON: MapReduce On Opportunistic eNvironments". HPDC'10.
^ Fabrizio Marozzo, Domenico Talia, Paolo Trunfio. "P2P-MapReduce: Parallel data processing in dynamic Cloud environments". In: Journal of
Computer and System Sciences, vol. 78, n. 5, pp. 1382--1402, Elsevier Science, September 2012.
^ Adam Dou, et al . "Misco: a MapReduce framework for mobile systems". HPDC'10.
^ "How Google Works". baselinemag.com. "As of October, Google was running about 3,000 computing jobs per day through MapReduce,
representing thousands of machine-days. Among others, these batch routines analyze latest Web pages and update Google's indexes."
^ "Database Experts Jump the MapReduce Shark".
^ a b David DeWitt; Michael Stonebraker. "MapReduce: A major step backwards". craig-henderson.blogspot.com. Retrieved 2008-08-27.
^ "Apache Hive - Index of - Apache Software Foundation".
^ Rubao Lee, et al "YSmart: Yet Another SQL-to-MapReduce Translator" (PDF).
Map Reduce-6
^ a b "HBase - HBase Home - Apache Software Foundation".
^ "Bigtable: A Distributed Storage System for Structured Data" (PDF).
^ Greg Jorgensen. "Relational Database Experts Jump The MapReduce Shark". typicalprogrammer.com. Retrieved 2009-11-11.
^ D. J. Dewitt, M. Stonebraker. et al "A Comparison of Approaches to Large-Scale Data Analysis". Brown University. Retrieved 2010-01-11.
^ US Patent 7,650,331: "System and method for efficient large-scale data processing "
^ Curt Monash. "More patent nonsense — Google MapReduce". dbms2.com. Retrieved 2010-03-07.
General references:
Dean, Jeffrey & Ghemawat, Sanjay (2004). "MapReduce: Simplified Data Processing on Large Clusters". Retrieved Nov. 23, 2011.
Matt WIlliams (2009). "Understanding Map-Reduce". Retrieved Apr. 13, 2011.
External links: Papers
"CloudSVM: Training an SVM Classifier in Cloud Computing Systems"-paper by F. Ozgur Catak, M. Erdal Balaban, Springer, LNCS
"A Hierarchical Framework for Cross-Domain MapReduce Execution" — paper by Yuan Luo, Zhenhua Guo, Yiming Sun, Beth Plale, Judy Qiu;
from Indiana University and Wilfred Li; from University of California, San Diego
"Interpreting the Data: Parallel Analysis with Sawzall" — paper by Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan; from Google Labs
"Evaluating MapReduce for Multi-core and Multiprocessor Systems" — paper by Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary
Bradski, and Christos Kozyrakis; from Stanford University
"Why MapReduce Matters to SQL Data Warehousing" — intro of MapReduce/SQL integration by Aster Data Systems and Greenplum
"MapReduce for the Cell B.E. Architecture" — paper by Marc de Kruijf and Karthikeyan Sankaralingam; from University of Wisconsin–Madison
"Mars: A MapReduce Framework on Graphics Processors" — paper by Bingsheng He, et al from Hong Kong University of Science and
Technology; published in Proc. PACT 2008. It presents the design and implementation of MapReduce on graphics processors.
"A Peer-to-Peer Framework for Supporting MapReduce Applications in Dynamic Cloud Environments" Fabrizio Marozzo, et al University of
Calabria; Cloud Computing: Principles, Systems and Applications, chapt. 7, pp. 113–125, Springer, 2010, ISBN 978-1-84996-240-7.
"Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters" — paper by Hung-Chih Yang, et al Yahoo and UCLA;
published in Proc. of ACM SIGMOD, pp. 1029–1040, 2007. (This paper shows how to extend MapReduce for relational data processing.)
FLuX: the Fault-tolerant, Load Balancing eXchange operator from UC Berkeley provides an integration of partitioned parallelism with process
pairs. This results in a more pipelined approach than Google's MapReduce with instantaneous failover, but with additional implementation cost.
"A New Computation Model for Rack-Based Computing" — paper by Foto N. Afrati; Jeffrey D. Ullman; from Stanford University; Not
published as of Nov 2009. This paper is an attempt to develop a general model in which one can compare algorithms for computing in an
environment similar to what map-reduce expects.
FPMR: MapReduce framework on FPGA—paper by Yi Shan, Bo Wang, Jing Yan, Yu Wang, Ningyi Xu, Huazhong Yang (2010), in FPGA '10,
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays.
Download