Let Mapreduce Programs Fly

advertisement
Let Mapreduce Programs Fly
Tang Zhenkun
Email: tangzk2011@163.com
Overview
Mapreduce Basics
 Hadoop Counters
 Hadoop Log Info(slf4j)
 Unit Test(JUnit, MRUnit)
 Guava(Google Core Libraries for Java 1.6+)
 Others
 References

Mapreduce Basics

Hadoop job submit flow
Mapreduce Basics

Hadoop Web GUI
Mapreduce Basics

Hadoop job submit flow
If errors?
1. Invisible to details
2. None step-through Debug
Not just pray!
Errors

Command Errors, Grammar Errors


Check, and check, and check again…
Logic Errors

That is the point that we need to deal with.
Hadoop Counters

Hadoop Standard Counters
Map output records
 Reduce output records


Custom Counters
How to custom a mapreduce counter?
输入文件
context.getCounter(counterName);
context.getCounter(groupName, counterName);
How to custom a mapreduce counter?
Hadoop Log Info

Stdout does not work.


System.out.println()
Use Logger.

Eg: log4j, slf4j
X
Hadoop Log Info – Slf4j
SLF4j – Simple Logging Façade for Java.
 Simple, easy to use.

Hadoop Log Info – Slf4j
Unit Test

TDD, Test-Driven Development,
Unit Test – JUnit
JUnit(Unit Test for Java)
 #Unit(for C#)
 XUnit

How to write unit tests using
JUnit?

小孩分油问题:两个小孩去打油,一人带
了一个一斤的空瓶,另一个带了一个七两、
一个三两的空瓶。原计划各打一斤油,可
是由于所带的钱不够,只好两人合打了一
斤(10两)油,在回家的路上,二人想平分这
一斤油,可是又没有其它工具。试仅用三
个瓶子(一斤、七两、三两)精确地分出两个
半斤油来。
How to write unit tests using JUnit?

Define a state:


Each represents the 10 ounces, 7ounces, and 3
ounces bottle.
Define the Operation:multiAndPlus(X, b)

Eg: pour 10 ounces from the first(10o) bottle to
the third one.
How to write unit tests using JUnit?
MatTest.java
Mat.java
How to write unit tests using
JUnit?
@Test
 @Before, @After
 Assert*


And last, RUN in Java Normal Application.
Unit Test - MRUnit

MRUnit, Unit Test for Hadoop Mapreduce
How to write unit test using MRUnit?
MapDriver
 ReduceDriver
 MapReduceDriver

withInput(key, value)
 withOutput(key, value)
 runTest()


And last, RUN in Java Normal Application.
Assertions
The Art of Assertion in CH5 of Programming
Pearls, Second Edition.
 Assert in Java

assert <boolean expression>
 assert <boolean expression> : <error message>



But, you must run the application with enabling
assertions implicitly.(java -ea <className>)
Precondition in Guava
Preconditions in Guava
Guava, Google Core Libraries for Java 1.6+
 Preconditions

checkArgument(i >= 0, "Argument was %s but expected nonnegative", i);
checkArgument(i < j, "Expected i < j, but %s > %s", i, j);
Guava

Other useful libraries.
http://code.google.com/p/guava-libraries/
How to custom a partitioner in
hadoop?
自定义Partitioner
自定义数据类型CustomType
How to custom a partitioner in
hadoop?
How to custom a partitioner in
hadoop?
Partitioner: return Key % 3
When change to: (return key / 3), and change the number of reduce tasks to 4
Totally ordering.
Others

Maven


Hadoop Remote Debug


Auto endependency management
JDWP, Java Debug Wire Protocol
HPROF

Analysis tools in JDK
References
Hadoop, the Definitive Guide, Second Edition.
 http://www.junit.org/
 http://incubator.apache.org/mrunit/
 http://code.google.com/p/guava-libraries/
 http://insightfullogic.com/blog/2011/oct/21/5reasons-use-guava/

Download