Facultatea de Științe Economice și Gestiunea Afacerilor Str. Teodor Mihali nr. 58-60 Cluj-Napoca, RO-400951 Tel.: 0264-41.86.52-5 Fax: 0264-41.25.70 econ@econ.ubbcluj.ro www.econ.ubbcluj.ro DETAILED SYLLABUS Methods in Data Science 1. Information about the study program 1.1 University 1.2 Faculty 1.3 Department 1.4 Field of study 1.5 Program level (bachelor or master) 1.6 Study program / Qualification Babeș Bolyai Economic Sciences and Business Administration Business Information Systems Business Information Systems Master Business Modeling and Distributed Computing 2. Information about the subject 2.1 Subject title Methods in Data Science 2.2 Course activities professor Lect. Dr. Darie Moldovan 2.3 Seminar activities Lect. Dr. Darie Moldovan professor 2.4 Year of 2.6 Type of I 2.5 Semester I Summative 2.7 Subject regime Mandatory study assessment 3. Total estimated time (teaching hours per semester) 3.1 Number of hours per week out of which: 3.2 course out of which: 3.5 56 course 4 2 3.3 seminar/laboratory 3.6 seminar/laboratory 2 3.4 Total number of hours in 28 28 the curriculum Time distribution Hours Study based on textbook, course support, references and notes 38 Additional documentation in the library, through specialized databases and field activities 24 Preparing seminars/laboratories, essays, portfolios and reports 45 Tutoring 8 Assessment (examinations) 4 Others activities 0 3.7 Total hours for individual 119 study 3.8 Total hours per semester 175 3.9 Number of credits 7 1 NOTE: This document represents an informal translation performed by the faculty. 4. Preconditions (if necessary) 4.1 Curriculum 4.2 Skills Not necessary Basic programming skills, basic statistics knowledge 5. Conditions (if necessary) 5.1. For course development 5.2. For seminar / laboratory development Notebook, beamer, Internet connection Computers with Internet connection 6. Acquired specific competences Professional competences Transversal competences Obtain key competences in data science - Cleaning and sampling data sets - Data management - Exploratory data analysis - Prediction based on statistical methods - Communication of results Gain competences in working within a team, segregate tasks, are able to learn from different areas connected to the addressed problem. 7. Subject objectives (arising from the acquired specific competences) 7.1 Subject’s general objective 7.2 Specific objectives Students must be familiar with data science methods and work through a data science project end to end. Students have to: learn how to analyze a dataset be able to access big data explore data and generate hypotheses use specific methods such as regression and classification for prediction communicate the results of their research using visualization tools and summaries 8. Contents 8.1 Course 1. Introduction. Course overview. About Data Science. 2. Univariate linear regression. Applications. Teaching methods Observations Lecture, demonstration, open 1 lecture discussion Lecture, demonstration, open 1 lecture discussion 2 NOTE: This document represents an informal translation performed by the faculty. Lecture, open 1 lecture discussion Lecture, demonstration, open 2 lectures discussion Lecture, 1 lecture open discussion Lecture, demonstration, open 1 lecture discussion Lecture, 1 lecture open discussion Lecture, open discussion, case 1 lecture studies Lecture, open discussion, 1 lecture demonstration Lecture, open discussion, 1 lecture demonstration Lecture, demonstration, open 2 lectures discussion Lecture, 1 lecture demonstration 3. Multivariate linear regression. Applications 4. Classification methods. Logistic regression. Decision Trees. 5. Neural networks. 6. Data Visualization. Effective Information visualization. 7. Applying learning algorithms. Data preprocessing. 8. Support Vector Machines 9. Clustering 10. Solution deployment 11. Big Data and Map Reduce 12. Large-scale data mining References: 1. Ian H. Witten, Eibe Frank, Datamining: practical machine learning tools and techniques, Morgan Kaufmann, 2011, 3rd ed. 2. Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning. Springer, 2009 3. Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, Cambridge, 2011 4. Pan-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Datamining, Addison Wesley, 2006 5. Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001. 6. Drew Conway, John Myles White, Machine Learning for Hackers. Case Studies and Algorithms to Get You Started, O'Reilly Media, 2012 7. Tom Mitchell, Machine Learning. McGraw-Hill, 1997. 8. S. Haykin, Neural Networks and Machine Learning, 3rd ed., Prentice Hall, 2008 8.2 Seminar/laboratory Demonstrative example case Building a simple linear regression model Teaching methods Observations Running examples and 1 Laboratory individual exercises/ Homework Running examples and 1 Laboratory individual exercises/ Homework 3 NOTE: This document represents an informal translation performed by the faculty. Multivariate linear regression in practice. Classification methods. Naïve Bayes, Decision trees, Logistic regression. Neural networks. Data Visualization tools. Feature selection, sampling the datasets and other preprocessing operations. Support Vector Machines. Clustering. Deploying the solution. MapReduce tools. Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework Running examples and individual exercises/ Homework 1 Laboratory 2 Laboratories 1 Laboratory 1 Laboratory 1 Laboratory 1 Laboratory 1 Laboratory 1 Laboratory 3 Laboratories References: 1. Ian H. Witten, Eibe Frank, Datamining: practical machine learning tools and techniques, Morgan Kaufmann, 2011, 3rd ed. 2. Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning. Springer, 2009 3. Jure Leskovec, Anand Rajaraman, Jeff Ullman, Mining of Massive Datasets, Cambridge, 2011 4. Pan-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Datamining, Addison Wesley, 2006 5. Richard Duda, Peter Hart and David Stork, Pattern Classification, 2nd ed. John Wiley & Sons, 2001. 6. Drew Conway, John Myles White, Machine Learning for Hackers. Case Studies and Algorithms to Get You Started, O'Reilly Media, 2012 7. Tom Mitchell, Machine Learning. McGraw-Hill, 1997. 8. S. Haykin, Neural Networks and Machine Learning, 3rd ed., Prentice Hall, 2008 9. Corroboration / validation of the subject’s content in relation to the expectations coming from representatives of the epistemic community, of the professional associations and of the representative employers in the program’s field. 4 NOTE: This document represents an informal translation performed by the faculty. The profession of data scientist has recently become very popular due to the growing data available for analysis. The increasing computational power has generated new possibilities for statisticians and other specialists working with data to access a new field: the automated data analysis, which requires interdisciplinary skills: statistics, machine learning and their applications. 10. Assessment (examination) Type of activity 10.1 Assessment criteria 10.2 Assessment methods 10.4 Course Multiple choice test grid Multiple choice quiz 10.5 Seminar/ laboratory Homework assignments 10.3 Weight in the final grade 40% 20% 40% End of semester project 10.6 Minimum performance standard • Minimum 50% of points for the course component • Minimum 50% of points for the seminar component Date of filling 28.01.2015 Signature of the course professor Lect.Dr. Darie Moldovan Date of approval by the department 28.01.2015 Signature of the seminar professor Lect. Dr. Darie Moldovan Head of department’s signature Prof. habil. Dr. Gheorghe Cosmin Silaghi 5 NOTE: This document represents an informal translation performed by the faculty.