Massive Data Processing

advertisement
大规模数据处理
Massive Data Processing
http://net.pku.edu.cn/~course/cs402/2014
闫宏飞
北京大学信息科学技术学院
7/1/2014
Outline
• MDP是什么?
• MDP课程安排和内容
2
Massive Data Processing
• Data-intensive information processing
– the relevant datasets are too large to
t in memory and must be held on disk.
– data-intensive processing is beyond the capability of
any individual machine and requires clusters
• Big data problems
• Focus on MapReduce programming
• An entry-level course~
3
大数据的特点
• 量大(Volume),是指它的复杂性
– 许多小的数据集结构复杂,尽管没有占用很多物理空间,也被认为
是大数据.
– 大数据库占用大的存储空间,因为结构简单,不认为是大数据.
• 样多(Variety)是指多种结构的特性
– 例如:混合结构,半结构和无结构数据的文本,声音和视频.
• 速度(Velocity)是指它成生和分析的速率
– 在某些应用中需要实时或者近实时.
• 真实性(Veracity),价值(Value)
What is MapReduce?
• Programming model for expressing distributed
computations at a massive scale
• Execution framework for organizing and
performing such computations
• Open-source implementation called Hadoop
5
课程的组织与安排
• 课堂时间
– 周二,周四(8:30开始)三教201
– 讲课老师:闫宏飞、彭博
– 助教:李睢、江翰
• 教学环节
– 课堂讲授,作业,上机指导,答疑
• 评分方法
– 以作业为中心,评分也以作业&报告为准
• 课程网站
– Web http://net.pku.edu.cn/~course/cs402/2014
– Group http://groups.google.com/group/cs402pku
TextBooks
• [Lin] Jimmy Lin and Chris Dyer, Data-Intensive Text
Processing with MapReduce, 2013.1.
• [Tom] Tom White, Hadoop: The Definitive Guide,
O'Reilly, 3rd, 2012.5.
This schedule is tentative and subject to change without notice
ID
Week1
Week2
Week3
Week4
Topics
Contents
Reading
Introduction to
MapReduce
Why large data?
Cloud Computing
Value of big data
[Lin]Ch1:Introduction
[Tom]Ch1:Meet Hadoop
MapReduce
Basics
How do we scale up?
MapReduce
HDFS
[Lin]Ch2:Mapreduce Basics
[Tom]Ch6:How mapreduce works
[GFS&MapReduce Paper]
MapReduce Program Develop
Basic MapReduce algorithm
design and design patterns
[Tom]Ch5:Developing a MapReduce
Application
[Lin]Ch3:MapReduce algorithm design
Introduction to Information
Retrieval
Inverted Index on MapReduce
Retrieval Problems
[Lin]Ch4:Inverted Indexing for Text
Retrieval
Graph Algorithm and Mapreduce
Parallel Breadth-First-Search
PageRank
[Lin]Ch5:Graph Algorithms
MapReduce
Algorithm
Design
Text retrieval
Graph
Algorithm
选课登记
• 个人选课登记,通过浏览器完成
– http://net.pku.edu.cn/~course/cs402/2014/regcourse.html
Thank You!
Q&A
Download