MapReduce Programming and Cluster Accessing Instructions Gang Luo

MapReduce Programming and

Cluster Accessing Instructions

Gang Luo

Sept. 2, 2010

(K1, V1)

Dataflow

(K2, V2)

(K2, List<V2>) (K3, V3)

A Query Example

Table1

Year Tempera ture

1998 87

Air

Quality

2

1983 93

2008 90

2001 89

1965 97

4

3

5

4

…

…

..

…

…

…

SELECT Year, MAX(Temperature)

FROM Table1

WHERE AirQuality = 0|1|4|5|9

GROUPBY Year

Implementation in MapReduce

( 1998, 87, 2, … )

Selection+

Projection

( 1998, 87 )

Aggregation

(MAX)

87

94

1998, 84

87

78

( 1998, 94 )

Mapper

Reducer

Driver

Think more!

• What if we want to get the average temperature for a year?

• What if you are only interested in the temperature in Durham? (Assume the station ID at Durham is 212)

You may want to change the code a little bit and fulfill a different query

Hadoop Cluster

• Master node:

– hadoop21.cs.duke.edu

• Slave nodes

– hadoop22.cs.duke.edu – hadoop36.cs.duke.edu

• Online job tracker *

– hadoop21.cs.duke.edu:50030

• Online HDFS info *

– hadoop21.cs.duke.edu:50070

*

You cannot access these pages outside CS trusted network. Solution:

Now, let’s see how to compile and run a MapReduce job in a cluster

What I will be showing you is covered by the instructions at the course website: http://www.cs.duke.edu/courses/fall10/cps216/Project/cluster_instruction

MapReduce Programming and Cluster Accessing Instructions Gang Luo

Related documents

Products

Support

MapReduce Programming and Cluster Accessing Instructions Gang Luo

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib