National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training Government of Russian Federation Federal State Autonomous Educational Institution of Higher Professional Education "National Research University 'Higher school of economics' Discipline program "Advanced data management" for direction 38.04.05 "Business Informatics", Master training for Master program “Big Data systems” Program’s author: Nikolay V. Markov, nikolay.markoff@gmail.com Moscow, 2015 This program can not be used by other parts of the university and other institutions of higher education without the permission of the department - developer of the program. National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training 1. Scope and normative references This program of an academic discipline establishes minimum requirements for knowledge and skills of the student and determines the content and types of studies and reports. The program is designed for teachers, leading this discipline, teaching assistants and students directions 080500.68 "Business Informatics" Master training, students in the master's program "Business Informatics". The program is developed in accordance with: the educational standards of the Federal State Autonomous Educational Institution of Higher Professional Education "National Research University "Higher School of Economics", the level of training: Master, approved by 26.06.2011; working curriculum of the University towards 38.04.05 "Business Informatics" Master training for master's program «Big Data Systems», approved in 2015 2. Goals for studying Formation of the theoretical knowledge and practical skills in the collection, storage, processing and analysis of large data. Develop skills and practical skills to analyze large data to tackle a wide range of applications, including analysis of corporate data, geospatial data, social media and network data. 3. Student competences, generated as a result of studying As a result, during the studying of the discipline a student should:: Understand the theory and fundamentals of storage, processing and analysis of big data, advanced tools for collection, storage, transmission and visualization of big data. To be able to process and analyze large amounts of data using R language code. Have the skills to use SMM methods and numerical methods for the better data analysis and data management. As a result of the development of the discipline the student acquires the following competences: Competence Ability to offer concepts, models, invent and test methods and tools of professional activity The ability to apply the methods of system analysis and modeling to evaluate and design Ability to develop and apply mathematical models to justify the design decisions in the field of ICT Ability to organize self and Forms and methods of teaching, contributing to the formation and development of competence Lectures, workshops, homework GEF/NR U code Descriptors - the main features of the development (indicators of achievement results) СК-2 Demonstrates ПК-13 Owns and uses Lectures, workshops, homework ПК-14 Owns and uses Lectures, workshops, homework ПК-16 Owns and uses Lectures, workshops, 2 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training Descriptors - the main features of the GEF/NR development (indicators of U code achievement results) Competence collective research work at the enterprise and manage it Forms and methods of teaching, contributing to the formation and development of competence homework 4. Place in the structure of the discipline of the educational program As part of the master's program «Big Data Systems» this discipline is a compulsory subject. For the proper development, students should: know the content of the following disciplines: numerical methods, optimization methods, data analysis, databases, discrete mathematics, theoretical foundations of computer science, computer systems, networks, telecommunications, information systems management and production company. Be able to use mathematical and IT-tools for management tasks. The main provisions of the discipline should be used for the further studying the discipline "Elaboration and implementation of big data." 5. Topical plan of an academic discipline № Total hours Topic name Classroom hours Homewo Lecture Semin Workshop rk s ars s 1 Introduction to the advanced data management 4 4 8 2 Cloud services usage for data management 2 2 11 3 Social media management. Gathering and 2 2 11 analysis of social network data. Twitter 2 2 11 5 Using R for web-analysis 2 2 10 6 Gathering, analysis and visualization of 2 2 8 relational databases 2 6 11 8 Distributed data processing 2 6 18 9 R language sources for big data visualization 2 2 16 analysis of social network data. Facebook 4 Social media management. Gathering and geospatial data 7 Properties of data storing in relational and non- Total 152 20 28 104 6. Forms of students knowledge control Type of Form of control 1st year Parameters 3 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training control Current (week) Total (week) Practical tasks 1 1 Exam 2 Practical tasks, result evaluation – 1 week 1 Oral exam, 20 min per student 6.1 Criteria for assessing the knowledge, skills The student should demonstrate the knowledge of sections of the discipline and the ability to present the results of homework and tests in accordance with the required competencies. Evaluation of all forms of monitoring are set on a 10-point scale. On the final evaluation on a subject matter consists of ratings for: Practical work – O1 exam – O2 according to the formula: Oi = 0.5*O1+ 0.5* O2 7. Program content Topic 1. Introduction to the analysis and management of big data What is data management? Sections of data management. The architecture, design and analysis of data. Management of databases. Security management data. Data quality management. The value of data management in business intelligence. Content management. Management of meta data. Management and analysis of media data. Basic literature 1. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 2. Cotton R. Learning R. O’Reilly Media, 2013 3. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 4. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 5. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 2. Data Management 4 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training What is cloud computing? Power of cloud data storage and processing. Cloud computing. SaaS, PaaS, IaaS. The possibility of the R language as applied to cloud services Amazon, Google and IBM. Basic literature 6. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 7. Cotton R. Learning R. O’Reilly Media, 2013 8. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 9. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 10. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 3. Model of distributed file systems and databases computing What is SMM? Why is the management of social networks for the modern enterprise? Types of data from social networks. Processing techniques for social media data. Features API Facebook. Loading, storage and analysis of Facebook data using the language R. Basic literature 11. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 12. Cotton R. Learning R. O’Reilly Media, 2013 13. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 14. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 15. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 4. Search for similarities in the data The importance of data from the social network Twitter for business. Twitter and SMM. Features of API Twitter. Loading, storage and analysis of Twitter data using language R. Basic literature 16. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 17. Cotton R. Learning R. O’Reilly Media, 2013 5 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training 18. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 19. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 20. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 5. Analysis of streaming data. What is Web scrapping? Google Analytics and other means of collecting web data. Types of data received from the network. Means of the R language for web scrapping. Basic literature 21. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 22. Cotton R. Learning R. O’Reilly Media, 2013 23. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 24. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 25. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 6. Link analysis. What is geospatial data? The main sources of geospatial data. Geospatial data in social networks and their role in business development. The use of geospatial data for the formation of social network graphs. Means of the R language for data collection, analysis and visualization of geospatial data. Basic literature 26. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 27. Cotton R. Learning R. O’Reilly Media, 2013 28. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 29. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 6 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training 30. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 7. Frequent datasets analysis Features of data storing in relational and non-relational databases. Justification of the choice of storage based on architecture and enterprise features. MySQL and NoSQL: similarities and fundamental differences. Means of the R language for working with MySQL and NoSQL. Basic literature 31. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 32. Cotton R. Learning R. O’Reilly Media, 2013 33. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 34. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 35. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 8. Clustering algorithms and their applications The need for distributed processing of large data. MapReduce and Hadoop. Diagram of a distributed data processing. Means of the R language for parallel computing and data processing using Hadoop. Basic literature 36. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 37. Cotton R. Learning R. O’Reilly Media, 2013 38. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 39. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 40. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Topic 9. Neural networks and their applications 7 National research university 'Higher school of economics' Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide 38.04.05 "Business Informatics" Master training Features of visualization of big data. Types and methods of visualization of structured, semistructured and unstructured data. Means of the R language for visualizing big data. Basic literature 41. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 42. Cotton R. Learning R. O’Reilly Media, 2013 43. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 44. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 45. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 8. Literature Basic literature 46. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New York, 2014 47. Cotton R. Learning R. O’Reilly Media, 2013 48. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015 Additional literature 49. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management. Cambridge University Press, 2011 50. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford University, 2010 Developers: NRU-HSE________ _______professor________ _____Nikolay V. Markov (workplace) (position) (инициалы, фамилия) 8