Department of Statistics

advertisement
Department of Statistics
Graduate Program
R253700
Statistical Data Mining
(統計資料採礦)
Spring 2011
The mission of the Department of Statistics is to cultivate quality professionals with enthusiasm
and global perspectives.
Graduate Program Learning Goals (goals covered by this course are indicated):

1
Graduate students should be able to communicate effectively verbally and in writing.

2
4
Graduate students should solve strategic problems with a creative and innovative approach.
Graduate students should demonstrate leadership skills and ethic demanded of a person in
authority.
Graduate students should possess a global economic and management perspective.
5
Graduate students should possess the necessary skills and values demanded of a true professional.
3

Instructor/開課教師:
Shuen-Lin Jeng /鄭順林
sljeng@stat.ncku.edu.tw
(06)2757575#53640
Prerequisite/先修科目 :
統計學 Statistics, 迴歸分析 Regression Analysis
Course Description/課程概述:
統計資料採礦是一門新近的熱門研究領域,主要探討如何在高維度、大量或複雜資料
中,使用統計方法發掘潛藏的有用資訊,以提供決策人員參考。
Statistical data mining is a popular research area lately. The main goal is to investigate the
high dimensional, large amount or complex data, and use statistical methods to provide
useful under covered information which can be critical for decision makers.
Course Objectives/教學目標:
本課程以統計角度 ,介紹資料採礦方法 ,並配合統計軟體Splus , R, 和 Insightful
Miner, SAS Enterprise Miner, 和SQL Server的使用,以培養學生從大量和複雜的資料中
發掘資訊的能力。
This course introduces the methods in data mining through the statistical point of view.
Using the software Splus, R, Insightful Miner, SAS Enterprise Miner, and SQL Server,
students will learn the ability to analyze massive and complicated data and will be able to
turn the raw data into valuable information.
Content Summary/授課課程大綱明細:
1.
2.
3.
4.
5.
6.
7.
8.
Linear Methods for Regression;
Linear Method for Classification;
Basis Expansion;
Kernel Smoothing Methods;
Bayes classifier;
Model Assessment and Selection;
EM, MCMC and Bagging
Additive Models, Trees and MARS;
9. Boosting Trees
10. Neural Networks;
11. Support Vector Machines;
12. Prototype Methods and Nearest-Neighbors;
13. Association Rules;
14. Genetic algorithm;
15. Cluster Analysis;
16. Principle Components;
17. Independent Component Analysis;
18. Random Forests
19. High Dimensional Problems.
Textbook/教材課本:
The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2ed. By
Hastie, Tibshirani, Friedman, Springer, 2009 (全華).
http://www-stat.stanford.edu/~tibs/ElemStatLearn/
Recommended references/參考書目:
1. Introduction to Data Mining, by Tan, Steinbach and Kumar, 2006, Addison Wesley.
(歐亞)
2. Pattern Recognition and Machine Learning, by C. M. Bishop, Springer, 2006.
3. Data mining methods and models, by Daniel T. Larose, Wiley, 2006.
4. Next generation of data-mining applications, edited by Mehmed M. Kantardzic,
Jozef Zurada, Wiley, 2005.
5. Introduction to Machine Learning, by Ethem Alpaydin, The MIT Press, 2004.
6. Data Mining 資料採礦: 顧客關係管理記電子行銷之應用, 裴瑞, 林諾夫,維科
出版社, 2001
Course Requirement/課程要求:
平時上課與上機(軟體操作)上課
Class and computer lab attendance
Grading Policy/評量方式:
1. 作業 30%
Homework 30%
2. 專題計畫報告 30% Project Report 30%
3. 期末考 40%
Final Exam 40%
Grading Policy for AACSB Multiple Assessment:
HW
30%
 Oral Commu./ Presentation
COMMU
 Written Communication
30%
 Creativity and Innovation
 Problem Solving
CPSI
 Analytical &
40%
Computational Skills
 Leadership & Ethic
LEAD
 Social responsibility
GLOB
 Global Awareness
VSP
 Values, Skills & Profess.
 Information Technology
 Management Skills
Project
30%
30%
50%
30%
20%
20%
30%
Final
40%
50%
Download