Syllabus CS479(7118) / 679(7112): Introduction to Data Mining Spring-2008 william.perrizo@ndsu.edu course web site: http://www.cs.ndsu.nodak.edu/~perrizo/classes/#0 Text Data Mining Han and Kamber, 2nd edition. Office Hours: MWF 11-11:50, in IACC 258 A15 (others by appointment) Please use email for questions that can be emailed. If you have a question that cannot be adequately stated or answered by email, please use the office hours. I need to ask that you please not come in to office hours if you have a cold or flu or another infection (until it is non-infectuous). Thank you so much. All assignments and your term paper must be SUBMITED THROUGH BLACKBOARD. (DO NOT email to william.perrizo@ndsu.edu as previously instructed). All records will be kept on the Blackboard system and will be available to you from there. When submitting your assignments and term paper through BLACKBOARD, please identify your work by using your first_name.last_name just as it appears in your NDSU email address (e.g., mine is william.perrizo). Lectures and Lecture notes are available from, http://www.cs.ndsu.nodak.edu/~perrizo/classes/#0, and also from the BLACKBOARD. Other additional materials are available on the website also. COURSE DESCRIPTION Topics: Introduction to Data (data processing, data warehousing and data cubes); and DataMining (association rule mining, classification or prediction and clustering). COURSE OBJECTIVES: Understand the fundamentals of data mining. Gain experience in data mining research and in the written reporting of it. TERM PAPER (100 points): Each student will pick an application area (or focus area) in which to concentrate. Your focus area can be an area of application of data mining such as Bioinformatics, Medical Computer Aided Detection, etc. (Chapters 8, 9, 10 and 11 are rich with suggested application areas. Read those chapters to get help choosing a focus area. Your term paper and your assignment solutions will be directed toward your focus area.). You should choose your focus area very early (first week!). You set or change your focus area any time by emailing your choice to me. Each student will have a unique focus area (first come, first serve via my email queue). Changing focus areas is even encouraged - as it will give you a chance to learn about more than one focus area. Note that your term paper should be in the focus area you end up with. Each assignment solution must describe relevance to your posted focus area at the due date of that assignment (see below). Your term paper will be a topic from that focus area (some example topics and focus areas in html are at Possible Topics and in powerpoint at Possible Topics ) or your own RESEARCH topic - but it must be a new RESEARCH idea of yours, NOT A PAPER written by someone else or a paper written for another course or for conference our journal publication). Included in the Possible Topics files is a complete set of guidelines on what to include in your paper and what format to use. Note that the guidelines are also available from the Blackboard system. Research the topic, write a quality (publishable in archival media?) paper. Topics will to be approved 1st-Come-1st-Serve (email title and abstract to william.perrizo@ndsu.edu). Papers are graded on contribution, level of current research interest, depth, correctness, clarity, and insight. 679 students, as graduate students, will be expected to achieve a higher level of true research on their paper. Assignments (70 points): Each chapter has > 10 exercises in the back. Please choose any 10 to solve and upload your solution to blackboard by the due data (see next slide). Please also make all of your solutions relevant to your chosen FOCUS AREA. Every solution should answer the question, "How does this apply to my FOCUS AREA) specifically. Changing focus areas is even encouraged - as it will give you a chance to learn about more than one focus area. Each assignment solution must describe relevance to your posted focus area as of the due date of that assignment. COURSE Assignments: Course website: http://www.cs.ndsu.nodak.edu/~perrizo/classes/#0 Assignment 1 is due January 18 5PM (10 chapter 1 exercises) (10 points) Assignment 2 is due February 1 5PM (10 chapter 2 exercises) (10 points) Assignment 3 is due February 15 5PM (10 chapter 3 exercises) (10 points) Assignment 4 is due February 29 5PM (10 chapter 4 exercises) (10 points) Assignment 5 is due March 14 5PM (10 chapter 5 exercises) (10 points) Assignment 6 is due March 28 5PM (10 chapter 6 exercises) (10 points) Assignment 7 is due April 11 5PM (10 chapter 7 exercises) (10 points) 9 5PM The Term Paper is due May Final Exam will be an oral exam over your paper and chapters 5,6,7 Grades will be based on a grade curve of your total points out of (100 points) (70 points) 240 points On all assignments, you must work alone. Please do not share your work with anyone or be shared with by anyone else. Submit assignments and paper through BLACKBOARD. You an schedule your final exam with me for any 20 minute period between 11 and 11:50 MWF, by emailing to me, your choice (first come, first serve via my email queue). You must schedule your exam by March 14), but you can schedule it (and take it, if you choose) any time before that too. COURSE DESCRIPTION continued REQUIRED MATERIALS: The text, email, WWW access are required. STUDENTS NEEDING SPECIAL ACCOMMODATIONS or who have special needs are invited to share that information with the instructor. PREREQUISITES: CS366 or equiv. Student must be able to read and follow technical, detailed instructions and adapt solutions. ACADEMIC HONESTY: Work must be completed in a manner consistent with NDSU Senate Policy 335: Code of Academic Responsibility and Conduct. The goals of this course include to initiate student's into data and data mining systems research and to enhance student's written presentation skills. Additional reference material on all topics in this course can be found on the web by doing a Google (or Yahoo or Ask) search on the appropriate keyword(s) and also by using the NDSU library. Good luck in your 479/679 course! Focus Areas and Term Paper Titles chosen so far Date Name Focus Area jan 8 jan 10 jan 13 jan 15 jan 18 jan 18 jan 18 jan 19 jan 20 jan 23 Jason Stone Ken Brown Basudha Pradhan Karl Gunderson Krishnakanth Ireddynaga Jianfei Wu Chaitanya Dumpala Loai Al-Nimer Samuel Kondamarri Dibakar Bhowmick Automatic Alerters in S. E. Financial Data Loan Payment Prediction/Classification for Customer Credit Policy Analysis Transactional Data Two products interaction (co-occurrence at checkout x) to maximize sales pattern representation, comparison and analysis DATA MINING THE WORLD WIDE WEB Stock Data Pattern Recognition, Classification and performance analysis Using Markov modelling Techniques Inter Entity Correlation Correlating protein domains and bacterial properties for entire bacterial genomes" Software Engineering Music and Musical instrument data analysis Term Paper Title