Mining of Massive Datasets

advertisement
Mining Massive Datasets
Course Overview
Mining Massive Datasets
Wu-Jun Li
Department of Computer Science and Engineering
Shanghai Jiao Tong University
Lecture 0: Course Overview
1
Mining Massive Datasets
Course Overview
General Information
 Instructor: Wu-Jun Li (李武军)




Email: liwujun@cs.sjtu.edu.cn
Homepage: http://www.cs.sjtu.edu.cn/~liwujun
Office: Rm 3-537, SEIEE Building
Office Hours: Tue 14:00 - 15:00
 Course web site:
http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html
 Teaching Assistant: Zhi-Qin Yu (余志琴)
 Email: xiaoyu199175@gmail.com
 Office Hours: TBD; Rm 3-503, SEIEE Building
 Time and Venue: Mon 14:00 – 15:40; Wed 10:00 - 11:40; Fri 08:00 09:40 ;
Rm 105, Dong Shang Yuan (东上院 105)
2
Mining Massive Datasets
Course Overview
Textbook
 Anand Rajaraman and Jeffrey D. Ullman.
Mining of Massive Datasets. Cambridge
University Press, 2011.
You can download it from the book website
(http://i.stanford.edu/~ullman/mmds.html).
3
Mining Massive Datasets
Course Overview
Reference Books
 Jiawei Han, and Micheline Kamber. Data Mining:
Concepts and Techniques. Morgan Kaufmann, Second
Edition, 2006.
(The English reprint edition can be bought through
China-Pub.)
 Christopher M. Bishop. Pattern Recognition and
Machine Learning. Springer, 2006.
 Chuck Lam. Hadoop in Action. Manning Publications,
First Edition, 2010.
 周憬宇,李武军,过敏意.《飞天开放平台编程指
南-阿里云计算的实践》. 电子工业出版社,2013
年3月.
4
Mining Massive Datasets
Course Overview
Course Topics
 Data-Intensive Scalable Computing (DISC)
 Cloud Computing
 MapReduce and Hadoop
 Data Mining and Machine Learning
 Basics: supervised learning; unsupervised learning; matrix
factorization
 Large-scale (distributed) implementations with Hadoop
 Data-Intensive Applications
 Search, link analysis, recommender systems, mining data
streams, advertising on Web
5
Mining Massive Datasets
Course Overview
Prerequisites
 Data structure
 Design and analysis of algorithms
 Linear algebra
 Probability theory
 Programming languages : Java, c++
6
Mining Massive Datasets
Course Overview
Grading Scheme
 Class attendance (10%)
 Homework (20%)
 Exam (40%): Final (40%)
 Project (30%)
 3 students / group
7
Mining Massive Datasets
Course Overview
Late Assignments
 Assignments turned in late will be penalized 20% per
late day
8
Mining Massive Datasets
Course Overview
Academic Honor Code
 Honesty and integrity are central to the academic
work.
 All your submitted assignments must be entirely your
own (or your own group's).
 Any student found cheating or performing plagiarism
will receive a final score of zero for this course.
9
Mining Massive Datasets
Course Overview
Questions?
10
Download