CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY Contact Info ysaygin@sabanciuniv.edu http://people.sabanciuniv.edu/~ysaygin Tel : 9576 No Specific office hours. You can drop by anytime you like. Email or call me to make sure I am at the office. Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 2 Course Info Reference Book: Data Mining Concepts and Techniques Author: Jiawei Han and Micheline Kamber Publisher: Morgan Kaufmann Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 3 Course Info Grading: Midterm : 30% (April 14-18) Homework : 10% Project : 30% Paper presentation : 10% Term Paper : 10% Attendance during paper presentations: 10% Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 4 Topics that will be covered Different Data Mining Techniques Association Rules Classification Clustering Data Mining and Security Issues Applications of Data Mining Data Warehousing Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 5 Aim of the course Knowledge: To introduce data mining concepts Skills: paper reading and presentation research and/or project work Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 6 A Rough Schedule March, April, First Week of May: Lectures on various data mining techniques Invited Speakers form Industry to share their experiences Remaining 4 weeks: Paper presentations and discussions in class about research issues Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 7 What I will do Give the basics on data mining broad data mining concepts research issues Project supervision Give directions and advise on the projects I proposed (will be provided in the next slides) Coordination of the presentations Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 8 What I expect you to do I expect you to do things wrt your background and expertise. Students with CS background will do projects involving implementation and/or research Others can do application projects On a real application That will involve data collection, cleaning etc With at least two data mining tools that will be compared in terms of functionality for the chosen application Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 9 What I expect you to do Understand the basic data mining concepts Choose a specific area and two related papers on the same topic for presentation in class Attendance is required for paper presentations and you will loose 2% of your overall for each presentation you missed. Write a term paper on the two papers presented. Do a project and a final report describing what you learned or achieved in the scope of the project. Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 10 Projects Data Mining and Game Theory. Will be co-supervised with Ozgur Kibris from Economics (Mostly research, and survey, may involve algorithms design. Good for students in SLP) Implementation of algorithms for data security against data mining methods (pure algorithms survey and implementation, good for CS students who like implementation) Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 11 Projects Development of algorithms for protecting sensitive data against various data mining algorithms (research and implementation, good for CS students) Hiding Sequential patterns in temporal data by changing time granularities is an example Survey and Implementation of the existing Privacy preserving data mining methods (pure implementation, good for CS students) Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 12 Introduction to Data Mining Why do we collect and process historical data? What is the purpose of data mining? What are the applications? Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 13 Introduction to Data Mining Data is mostly stored in data warehouses Data Mining Techniques are used to analyse the data: Association rule finding from transactional data Clustering of data with multiple dimensions Classification of given data into predefined classes Faculty of Engineering and Natural Sciences, Computer Science and Engineering Program 14