Gang Luo
Sept. 14, 2010
• Source code management
• Version control
• Enable team collaboration
– One central repository, multiple local copies
– Synchronize local copy with the central one to ensure everybody see the latest modification
You should access the central repository from linux.cs.duke.edu, instead of hadoop21.cs.duke.edu
• Install Git
– PuTTY + Git (for windows)
– Eclipse + EGit (for windows/Linux)
– linux.cs.duke.edu (Git already installed )
– apt-get install git-core (for Ubuntu Linux)
– yum install git-core (for Federa/Other Linux)
• Initilization
– Set user name, email and color to highlight something
• Clone
– Localize a copy of remote repository
– git clone ssh://USERNAME@linux.cs.duke.edu/usr/research/proj/git/cps216/USE
RNAME.git
• Adding files
– git add . (don’t forget the dot which means all)
• Commit changes
– git commit –m “message” –a
– “message” could be anything you want to appear in the log
• Synchronize with remote repository
– git push
• Push your modification to the central repository
– git pull
• Update your local copy from the central repository
• Put you code in the appropriate directories
– e.g. cps216/assignemnt1/parta
• Give README file
– Briefly shows the organization of your code, the meaning of each class and instructions on how to run your code
• Output key/value type setting
– setOutputValueClass() and setOutputKeyClass() cover both map and reduce output key/value type.
• What if your mapper output types are different from reducer?
– Specify map input/output by setMapOutputValueClass() and/or setMapOutputKeyClass()
• Input/output types for combiner
– Input types should be the same as map output types. (Obviously)
– Output types should be also the same as map output types. (why?)
• Combiner is not called on every record. If you have a different output types in combiner, you will end up with having two different types at reducer.
(K1, V1) → (K2, V2) → (K2, V2) → (K2, V2) → (K2, V2) → (K3, V3)
Mapper Combiner Reducer
• Separate a string by separator “|”
– If “|” doesn't work, try “\\|”
• Need to ship more than one value in one value object?
– Implement you own Writable type, or
– Use Text. “23#16#87” contains three values in one string!
• configure(JobConf conf)
– Put your initialization in this method
– Good place to retrieve some parameters from
JobConf. ( conf.getXXX() )