Week 10, Testing
• Programming in the small vs. Programming in the large
Parlante’s link: codingbat.com
• unit tests for programming in the small
• Apache rule: Before submitting patch to
Hadoop Component, pass and verify all component unit tests. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Hadoop Unit Tests installed in bigtop
• Great reference: http://www.cloudera.com/blog/2008/12/testinghadoop/
• Run tests on downloaded hadoop-0.20.205.0: ant test
• Where are the bigtop shims for hadoop-
0.20.205.0/1.0/.22? For hive/pig?
• Other shims are available but don’t work, have to pick at build time. In latest relese. Hive 0.8.1. Pig in 0.9.2. dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Symlink src: ubuntu@ip-10-116-217-
28:/usr/lib/hadoop$ sudo ln -s
/usr/src/hadoop /usr/lib/hadoop/src dougc [at] gmail25 dot com 2012 All Rights
Reserved
• sudo chmod 757 /usr/lib/hadoop,
/usr/lib/hadoop/bin, /usr/lib/hadoop/sbin dougc [at] gmail25 dot com 2012 All Rights
Reserved
• If running in AWS, setup Screen
– sudo apt-get install screen screen-profiles screenprofiles-extras
– Type screen, will see clear terminal window, start ant test, ctrl-a ctrl-d, logout, login again, type screen -r
• Ron’s fix: Modify /etc/hostname dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Standalone: Logs for each test under
~/hadoop-0.20.205.0/build/test
• Bigtop: /usr/lib/hadoop/build/test dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Copy testConf.xml: sudo cp
/usr/src/hadoop/test/org/apache/hadoop/cli/testConf.
xml /home/ubuntu/bigtop-0.2.0-incubating/bigtoptests/testexecution/smokes/hadoop/target/clitest_data/
• https://issues.cloudera.org/browse/DISTRO-44
• Add Jackson dependency to pom.xml
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.3</version>
</dependency> dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Running single integration test: mvn –
Dit.test=org.apache.bigtop.itest.hadooptests.CL
ASS verify
Example: mvn -
Dit.test=org.apache.bigtop.itest.hadooptests.Tes
tTestCLI verify dougc [at] gmail25 dot com 2012 All Rights
Reserved
• ~/hbase-0.9.2/mvn –P localTest
• Running a single unit test: mvn test -
Dtest=org.apache.hadoop.hbase.TestHServerA
ddress dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Don’t exist? Put in /usr/src/hbase like Hadoop and use Groovy shell to run?
• Project to get Hbase unit tests working in bigtop? Partition hbase unit tests into categories. One approach to issue requests, look at internal state and verify.
Another approach only use public APIs, r/w to Hbase. Partition into 2 categories.
MiniHbase mock objects in a single JVM process can be used in Bigtop. Different bugs in distributed mode vs. MiniMr/DFSCluster. Write this up as a project.
• PIG uses same test artifacat from unit test for bigtop.
• Missing pom goals
• Use
– org.apache.bigtop.itest.JUnitUtils.groovy. For annotation support in Junit4/groovy.
– org.apache.bigtop.itest.junit.OrderedParameterized.java; extension of Junit, Junit has all tests are stateless, order doesn’t matter. Tests are not stateless in bigtop, ordering requires run stages, specify which run stage; simple ints with ordering. By default are in run stage 0, if have tests case annotated -1 run stage will execute this first. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• PackageManager/Abstract Class
• What is DEBPackage.groovy, ManagedPacakge.groovy,
RPMPackage.groovy for?
• AptCmdLinePackageManager.groovy allows apt-get commands in
Groovy?
• YumCmdLinePackageManager, RPMPackage,
ZypperCmdLinePackageManager
• Bigtop spends time on packaging like apt-get install, no existing Java
APIs to do this, install packages using Java Api. Used internally for
Jenkins testing, tests in test-artifacts/package. Manifest driven in xml files for what is expected from package, files with xxx permissions, check and verify paths and permission. If you are introducing a new package you are responsible for this abstract class testing. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Bigtop-2.0-incubating/bigtop-tests/testexecution/smokes/hbase/mvn verify
• /home/ubuntu/bigtop-0.2.0incubating/system/TestLoadAndVerify.java
• // private static final long NUM_TO_WRITE_DEFAULT = 100*1000; private static final long NUM_TO_WRITE_DEFAULT = 10;
• //private static final int NUM_TASKS = 200;
• //private static final int NUM_REDUCE_TASKS = 35; private static final int NUM_TASKS=2; private static final int NUM_REDUCE_TASKS=2; dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• ant test or mvn test
• Project: mavenize Hive:hard
• Project: Pig, easier?
• ~/hive-0.7.1/src/build.xml
• ~/pig-0.9.2/build.xml
• mahout-0.6-src, ~/mahout-distribution-0.6; mvn test; install core and src, 2 subdirectories with same name ~/mahoutdistribution-0.6/mahout-distribution-0.6/pom.xml
• git clone https://github.com/yahoo/oozie.git
; mvn test
• git clone https://github.com/cloudera/flume.git
; mvn test dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Problem with mvn artifact… dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Hive Unit Tests install own version of Hadoop
• ~/hive-0.7.1/src/build/hadoopcore/
• Remove test
TestHadoopThriftAuthBridge20S.java. Cant connect to Thrift Server, socket timeout > 6x. dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Follow Pig format, work with Hive Unit Test
Authors. Hive integration project suggestion.
Hive/Pig on top of M/R with custom language.
With a compiler, the unit tests have input/expected output. Hive unit tests are SQL code and verification afterwards. Was hard to retrofit vs. real cluster. Took *.SQL files from Hive and dumping them in Bigtop to take SQL files and compare actual/expected. Can you reuse the same test artifacts for unit tests and bigtop integration tests. Convert Hive Unit tests dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Transition to FlumeNG. NG lost features from
Flume. Too early. dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• What to set oozie_url to? http://localhost:11000/oozie
• Start the oozie service..
• Runs only Oozie examples.jar. Project to create workflow for oozie. Integration testing on cluster needed here. Actions, to broaden data interfaces. Email actions, sqoop action. Good project for J2EE developers. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Where to modify the bigtop install scripts to fix this? dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Create a Java Project, test programs using
HDFS/Hive/Pig, etc… 2 ways to run the files, command line or in Eclipse. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• This may be important when debugging in cluster and pseudo-distributed mode.
• Cluster loads the 3 conf/ files, core-site.xml, hdfs-site.xml , mapred-sire.xml. Some of the parameters are embedded..
• Java Code may not properly init these params for cluster operation. Sometimes hard to debug dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Command line uses bin/hadoop jarfilename.jar ClassName args
• Did this when running Pi from hadoop-xxxexamples.jar, test programs under jar dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Set absolute path for log4j.properties
PropertyConfigurator
.configure("/Users/dc/Documents/ workspace/log4j.properties");
• Properties files are outside of the jar. Web search for adding log4j.properties to jar are incorrect. Web search for setting class path are incorrect. dougc [at] gmail25 dot com 2012 All Rights
Reserved
14:03:08,263 INFO TestHDFS:34 - Yo I am logger!!!
created new jobconf finished setting jobconf parameters generateSampeInpuIf inputDirectory:file:/tmp/MapReduceIntroInput exists:true
14:03:08,588 INFO TestHDFS:59 - isEmptyDirectory
14:03:08,595 INFO TestHDFS:65 - num file status:4
14:03:08,596 INFO TestHDFS:75 - file:///tmp/MapReduceIntroInput is not empty
14:03:08,596 INFO TestHDFS:80 - A non empty file file:///tmp/MapReduceIntroInput/asdf.txt was found
14:03:08,597 INFO TestHDFS:46 - The inputDirectory file:/tmp/MapReduceIntroInput exists and is either a file or a non empty directory
14:03:08,598 INFO TestHDFS:111 - Generating 3 input files of random data, each record is a random number TAB the input file name dougc [at] gmail25 dot com 2012 All Rights
Reserved
14:03:25,076 INFO JobClient:589 Map input records=15
14:03:25,076 INFO JobClient:589 Reduce shuffle bytes=0
14:03:25,076 INFO JobClient:589 Spilled Records=30
14:03:25,077 INFO JobClient:589 Map output bytes=303
14:03:25,077 INFO JobClient:589 Total committed heap usage (bytes)=425000960
14:03:25,077 INFO JobClient:589 Map input bytes=302
14:03:25,077 INFO JobClient:589 SPLIT_RAW_BYTES=358
14:03:25,078 INFO JobClient:589 Combine input records=0
14:03:25,078 INFO JobClient:589 Reduce input records=15
14:03:25,078 INFO JobClient:589 Reduce input groups=15
14:03:25,078 INFO JobClient:589 Combine output records=0
14:03:25,079 INFO JobClient:589 Reduce output records=15
14:03:25,079 INFO JobClient:589 Map output records=15
14:03:25,079 INFO TestHDFS:235 - The job has completed.
14:03:25,079 INFO TestHDFS:241 - The job completed successfully.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
21:44:36,765 INFO TestHDFS:35 - Yo I am logger!!!
created new jobconf finished setting jobconf parameters generateSampeInpuIf inputDirectory:file:/tmp/MapReduceIntroInput exists:true
21:44:36,983 INFO TestHDFS:60 - isEmptyDirectory
21:44:36,990 INFO TestHDFS:66 - num file status:4
21:44:36,991 INFO TestHDFS:76 - file:///tmp/MapReduceIntroInput is not empty
21:44:36,991 INFO TestHDFS:81 - A non empty file file:///tmp/MapReduceIntroInput/asdf.txt was found
21:44:36,992 INFO TestHDFS:47 - The inputDirectory file:/tmp/MapReduceIntroInput exists and is either a file or a non empty directory
21:44:36,992 INFO TestHDFS:112 - Generating 3 input files of random data, each record is a random number TAB the input file name
21:44:36,999 WARN NativeCodeLoader:52 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21:44:37,007 INFO TestHDFS:169 - The job output directory file:/tmp/MapReduceIntroOutput exists and is not a directory and will be removed
21:44:37,022 INFO TestHDFS:235 - Launching the job.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
21:44:53,523 INFO JobClient:589 SPLIT_RAW_BYTES=358
21:44:53,524 INFO JobClient:589 Combine input records=0
21:44:53,524 INFO JobClient:589 Reduce input records=19
21:44:53,531 INFO JobClient:589 Reduce input groups=19
21:44:53,531 INFO JobClient:589 Combine output records=0
21:44:53,531 INFO JobClient:589 Reduce output records=19
21:44:53,531 INFO JobClient:589 Map output records=19
21:44:53,532 INFO TestHDFS:237 - The job has completed.
21:44:53,532 INFO TestHDFS:243 - The job completed successfully.
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• What you get for free in M/R
• Sorting
• Duplicate Detection
• Design Pattern Notes: Object Churn, Thread
Safety in Mappers
– ThreadLocal vs. Atomic Ivars vs. Locks dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Hadoop Partitioner, multiple output files or 1 output file.
– Job.setNumReduceTasks(1) same as merge sort
– Default HashPartitioner
– Create own for filtering, e.g. sending all keys which start with a common prefix to one specific file. dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Serialization: Writable I/F order of magnitude performance improvement
• HDFS Block R/W
• JobTrackers/TaskTrackers/NameNodes. Each file operation directly goes to the NN. dougc [at] gmail25 dot com 2012 All Rights
Reserved
Hadoop
Component,
/usr/lib
Hbase
Hive
Pig
Zookeeper
Mahout
Tests Exist under bigtop-tests
No
No
No
Yes
Yes
IncrementalPELoad.java,
TestHBaseCompression.java,
TEstHBasePigSmoke.groovy,
TestHBaseSmoke.java,
TestHFileOutpuFormat.java,
TestLoadIncrementalHFiles.java
HiveBulkScriptExecutor.java,
IntegrationTestHiveSmokeBulk.groovy,
TestHiveSmokeBulk.groovy,
TestJdbcDriver.java
Yes, in Hbase
Part of components dougc [at] gmail25 dot com 2012 All Rights
Reserved
Flume
Hadoop
Package Test
Hue
Oozie
Hadoop Component
Whirr
SQOOP
Test Code exists
No, not needed?
Program name
Yes IntegrationTestSqoopHive.g
roovy,
IntegrationTestSqoopHbase
.groovy
TestFlumeSmoke.groovy
Yes
Yes
Yes
TestCLI.groovy,
TesthadoopSmoke,
TestHadoopExamples
PackageTestCommon.groov
y,
Yes, part of package test
Yes dougc [at] gmail25 dot com 2012 All Rights
Reserved
TestOozieSmoke.groovy,
StateVerifierZookeeper.gro
ovy
• Groovy runtime allows shell commands which allow you to use the scripts inside the components saving debugging time for the classpaths and environment files
• Alternatively use Java libraries, DFSCluster,
MiniMRCluster, reverse engineer the env. vars settings, sequence of commands to run, Start from a HDFS file system then work way up to
Bigtop Component dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Working map reduce program, run them using mvn verify. Have to make sure HDFS/Hadoop is running first dougc [at] gmail25 dot com 2012 All Rights
Reserved
dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Assumes HDFS running
• Configuration conf = new Configuration();
• conf.addResource('mapred-site.xml’)
• Shell sh = new Shell("/bin/bash -s"); sh.exec("hadoop fs –mkdir /tmp/test”,
“hadoop fs –copyFromLocal one /tmp/test/one”,
“hadoop fs –cat /tmp/test/one”); dougc [at] gmail25 dot com 2012 All Rights
Reserved
• Integrate unit testing into Bigtop
• More integration testing
• Integrate different versions of Hadoop
Components(Hbase, Hive, etc) into Bigtop
• Mavenize an ant centric Hadoop Component, Pig,
Hive
• Puppet Lab; Bigtop puppet code used in CDH4 to deploy/test
• Deploying and testing in cluster dougc [at] gmail25 dot com 2012 All Rights
Reserved