Hadoop

advertisement
Hadoop
Ali Sharza Khan
High Performance Computing
1
Table of Content
•
•
•
•
•
•
•
Hadoop
Where did Hadoop come from ?
What problems can Hadoop solve?
Where does Hadoop applies to ?
How is Hadoop architected?
Two main parts of Hadoop
Conclusion
2
Hadoop
• What is Hadoop ?
– Open Source project
– Processing Large data sets in parallel
3
Where did Hadoop come from?
• Google
• Yahoo, Facebook, Twitter and Linkedln are
actively contributing towards Hadoop.
4
What problems can Hadoop solve?
• Where you have lot of data
• Run analytics that are deep and
computational extensive
5
Where does Hadoop applies to ?
•
•
•
•
•
•
Search engine
Finance
Online Retail
Government
Media and entertainment
Research Institution and other market
6
How is Hadoop architected?
• Every server has 2 or 4 or 8 Cpu’s.
• Each server operates on its own little piece of
data.
• Hadoop clusters at Yahoo covers 25000
servers, and store 25 petabytes of application
data.
• The largest cluster being 3500 servers.
7
Cloudera CEO Interview
http://www.youtube.com/watch?v=q
NP4_ICDeqE
8
Two main parts of Hadoop
• HDFS (Hadoop Distributed File System)
• Map Reduce Framework
– Map Phase
– Reduce Phase
– JobTracker (The master)
– TaskTracker (The slave)
9
MapReduce FrameWork
10
Conclusion
• Why Hadoop is able to deal with lots of data?
• Why Hadoop is able to compute complicated
Computational questions?
11
Download