Hadoop Ali Sharza Khan High Performance Computing 1 Table of Content • • • • • • • Hadoop Where did Hadoop come from ? What problems can Hadoop solve? Where does Hadoop applies to ? How is Hadoop architected? Two main parts of Hadoop Conclusion 2 Hadoop • What is Hadoop ? – Open Source project – Processing Large data sets in parallel 3 Where did Hadoop come from? • Google • Yahoo, Facebook, Twitter and Linkedln are actively contributing towards Hadoop. 4 What problems can Hadoop solve? • Where you have lot of data • Run analytics that are deep and computational extensive 5 Where does Hadoop applies to ? • • • • • • Search engine Finance Online Retail Government Media and entertainment Research Institution and other market 6 How is Hadoop architected? • Every server has 2 or 4 or 8 Cpu’s. • Each server operates on its own little piece of data. • Hadoop clusters at Yahoo covers 25000 servers, and store 25 petabytes of application data. • The largest cluster being 3500 servers. 7 Cloudera CEO Interview http://www.youtube.com/watch?v=q NP4_ICDeqE 8 Two main parts of Hadoop • HDFS (Hadoop Distributed File System) • Map Reduce Framework – Map Phase – Reduce Phase – JobTracker (The master) – TaskTracker (The slave) 9 MapReduce FrameWork 10 Conclusion • Why Hadoop is able to deal with lots of data? • Why Hadoop is able to compute complicated Computational questions? 11