HADOOP - ShareCourse

+ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 + 2 Outline  Introduction  Architecture of Hadoop  HDFS  MapReduce  Comparison  Why Hadoop  Conclusion 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 + What is Hadoop ?  open-source  process  Easy  lots and store big data to use and implement, economic, flexible of nodes(server)  written  free software framework in JAVA license  created by Doug Cutting and Mike Cafarella in 2005 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 3 + Advantages of Interpreted Language  Cross-platform(ex: Windows, Ubuntu, Mac  smaller OS X) executable program size  easier to modify during both development and execution 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 4 + Architecture of Hadoop 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 5 + Hadoop in Enterprise The Dell representation of the Hadoop ecosystem. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 6 + Hadoop in Enterprise 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 7 + Who is using Hadoop ? more than half of the Fortune 50 uses Hadoop by 2013 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 8 + 9 HDFS  Hadoop Distributed File System  Client: user  name node: manage and store metadata, namespace of files  Data node: store files  each data node sends its status to name node periodically 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 + HDFS: Writing data in HDFS  Each file will be divided into blocks(in size 64 or 128MB) , and have three copies in different data nodes.  Client asks name node to get a list of data node sorted by distance, and send the file to the nearest one , then the data node will send the file to the rest node.  When above operation done, data node will send “done” to name node. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 10 + HDFS: Reading data in HDFS  Client send filename to the name node , then the name node will send a list of the blocks of files sorted by distance.  Client use the list to get the file from data node. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 11 + HDFS: failure  node failure  communication  data failure corruption 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 12 + HDFS: handle failure  Handle writing failure: name node will skip the data node without an ACK.  Handle reading failure: recall that when reading a file, client will get a list of data node content the file. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 13 + HDFS: handle failure  Name node handle node failure : name node will find out the data the failure node have, and copy those data from others and restore them to other data node.  Note that HDFS can’t guarantee at least one copy of data is alive. 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 14 + 15 MapReduce  similar to divide-and-conquer  First, use “Map” to divide tasks  Second, use “Shuffle” to “transfer the data from the mapper nodes to a reducer’s node and decompress if needed. “  Third, use “Reduce” to “execute the userdefined reduce function to produce the final output data. “ 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 + MapReduce-Map 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 Figure 2: Execut ion of a map t ask showing t he 16 Figu + MapReduce-shuffle 17 Figure 1: Execut ion of a MapReduce job. 100062116 林威宏、 100062220 施閔耀 asks100062108 while李智宇、 t aking dat a locality int o account . Each TaskTracker has a predeﬁned numb + MapReduce-Reduce 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 18 + MapReduce 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 19 + Comparison 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 20 + Comparison 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 21 + Why Hadoop? technically Comparison of Grep Task Result with Vertica and DBMS-X 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 22 + Why Hadoop? technically  Simple structure vs. Optimization  Transaction  Lower time not minimized performance with same number of nodes  No compelling reason to choose Hadoop 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 23 + Why Hadoop? commercially 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 24 + Why Hadoop commercially  Cheap (Buy more servers to beat DBMS)  Flexible (Both in design and deployment)  Easier to design  Easier to scale up  Combine with other system to achieve better performance 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 25 + Conclusion  Hadoop is much easier for users to implement and more economic  MapReduce advocates should study the techniques used in parallel DBMSs  Hybrid systems are also popular  With improvement of performance, we believe Hadoop will lead the trend of big data computing 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 26 + Reference  http://hadoop.apache.org/  http://www.runpc.com.tw/content/cloud_content.aspx?id=105318  http://en.wikipedia.org/wiki/Apache_Hadoo  https://www.facebookbrand.com/  http://assets.fontsinuse.com/static/use-media-items/15/14246/full2048x768/522903b7/Yahoo_Logo.png  http://wiki.apache.org/hadoop/PoweredBy  http://semiaccurate.com/assets/uploads/2011/09/Amazon-logo.jpg  http://www.conceptcupboard.com/blog/wpcontent/uploads/2013/09/google.jpg 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 27 + Reference  http://datashieldcorp.com/files/2013/11/adobe-LOGO-2.jpg  http://upload.wikimedia.org/wikipedia/commons/7/77/The_New_ York_Times_logo.png  http://i.dell.com/sites/content/business/solutions/whitepapers/en/ Documents/hadoop-introduction.pdf  http://hadoop.intel.com/pdfs/IntelDistributionReferenceArchitectur e.pdf  http://www.google.com.tw/url?sa=t&rct=j&q=&esrc=s&source=we b&cd=2&ved=0CDQQFjAB&url=http%3A%2F%2Fwww.classcloud. org%2Fcloud%2Frawattachment%2Fwiki%2FHinet100402%2F02.HadoopOverview.pdf& ei=IE2XUtLfBMfxiAea_oHQCA&usg=AFQjCNFoIXxLJrOnoul4cKJpQ8 v3_kuTYg 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 28 + Reference  http://www.accenture.com/SiteCollectionDocuments/PDF/Accentur e-Hadoop-Deployment-Comparison-Study.pdf  https://www.google.com.tw/url?sa=t&rct=j&q&esrc=s&source=web &cd=1&ved=0CCkQFjAA&url=http%3A%2F%2Fwww.psgtech.edu %2Fyrgcc%2Fattach%2FMAP%2520REDUCE%2520PROGRAMMIN G.ppt&ei=7lGXUtvCJsy5iAfWtYH4Bw&usg=AFQjCNGWRKJLaltvbvORULZV6_Te2y74g&sig2=Ba77ihsV1SEqcNeEFkRzfg  https://www.cs.duke.edu/starfish/files/hadoop-models.pdf  http://dotnetmis91.blogspot.tw/2010/04/hdfs-hadoopmapreduce.html  http://wiki.apache.org/hadoop/HDFS  http://www.ewdna.com/2013/04/Hadoop-HDFS-Comics.html 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 29 + Reference  http://en.wikipedia.org/wiki/Interpreted_language  A Comparison of Approaches to Large-Scale Data Analysis by Sam Madden  http://www.cc.ntu.edu.tw/chinese/epaper/0011/20091220_1106.ht m  http://web.cs.wpi.edu/~cs561/s12/Lectures/6/Hadoop.pdf  http://www.mobilemartin.com/mobile/show-me-the-mobilemoney.jpg 100062108 李智宇、 100062116 林威宏、 100062220 施閔耀 30

HADOOP - ShareCourse

Related documents

Products

Support

HADOOP - ShareCourse

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib