Programme

advertisement
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to guide
080500.68 "Business Informatics" Master training
Government of Russian Federation
Federal State Autonomous Educational Institution of Higher Professional
Education
"National Research University
'Higher school of economics'
Faculty of Business Informatics
Discipline program
"Advanced data management"
for direction 38.04.05 "Business Informatics", Master training for Master program “Big Data
systems”
Program’s author:
Nikolay V. Markov, nikolay.markoff@gmail.com
Approved at the meeting of the Department of
information and business in the sphere of information technologies
Head of Department, Svetlana V. Maltseva
«____»____________ 2014 г.
_____________________
Recommended by the EMS section of «Business Informatics» «____»____________ 2014 г.
Chairman, Y. V. Taratukhina
____________________
Moscow, 2014
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
This program can not be used by other parts of the university and other institutions of higher
education without the permission of the department - developer of the program.
1. Scope and normative references
This program of an academic discipline establishes minimum requirements for knowledge and
skills of the student and determines the content and types of studies and reports.
The program is designed for teachers, leading this discipline, teaching assistants and students
directions 080500.68 "Business Informatics" Master training, students in the master's program
"Business Informatics".
The program is developed in accordance with:

the educational standards of the Federal State Autonomous Educational Institution of
Higher Professional Education "National Research University "Higher School of
Economics", the level of training: Master, approved by 26.06.2011;

working curriculum of the University towards 080500.68 "Business Informatics" Master
training for master's program «Big Data Systems», approved in 2013
2. Goals for studying

Formation of the theoretical knowledge and practical skills in the collection, storage,
processing and analysis of large data.
 Develop skills and practical skills to analyze large data to tackle a wide range of applications,
including analysis of corporate data, geospatial data, social media and network data.
3. Student competences, generated as a result of studying
As a result, during the studying of the discipline a student should::
 Understand the theory and fundamentals of storage, processing and analysis of big data,
advanced tools for collection, storage, transmission and visualization of big data.
 To be able to process and analyze large amounts of data using R language code.
 Have the skills to use SMM methods and numerical methods for the better data analysis and
data management.
As a result of the development of the discipline the student acquires the following
competences:
Competence
Ability to offer concepts,
models, invent and test
methods and tools of
professional activity
The ability to apply the
methods of system analysis
and modeling to evaluate
and design
Ability to develop and apply
mathematical models to
Forms and methods of
teaching, contributing to the
formation and development of
competence
Lectures, workshops,
homework
GEF/NR
U code
Descriptors - the main features of the
development (indicators of
achievement results)
СК-2
Demonstrates
ПК-13
Owns and uses
Lectures, workshops,
homework
ПК-14
Owns and uses
Lectures, workshops,
homework
2
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
Competence
GEF/NR
U code
Descriptors - the main features of the
development (indicators of
achievement results)
justify the design decisions
in the field of ICT
Ability to organize self and
collective research work at
the enterprise and manage it
ПК-16
Owns and uses
Forms and methods of
teaching, contributing to the
formation and development of
competence
Lectures, workshops,
homework
4. Place in the structure of the discipline of the educational program
As part of the master's program «Big Data Systems» this discipline is a compulsory subject.
For the proper development, students should:
 know the content of the following disciplines: numerical methods, optimization
methods, data analysis, databases, discrete mathematics, theoretical foundations of
computer science, computer systems, networks, telecommunications, information
systems management and production company.
 Be able to use mathematical and IT-tools for management tasks.
The main provisions of the discipline should be used for the further studying the discipline
"Elaboration and implementation of big data."
5. Topical plan of an academic discipline
№
Topic name
Total
hours
Classroom hours
Homewo
Lecture Semin Workshop
rk
s
ars
s
1
Introduction to the advanced data management
4
4
8
2
Cloud services usage for data management
2
2
11
3
Social media management. Gathering and
2
2
11
analysis of social network data. Twitter
2
2
11
5
Using R for web-analysis
2
2
10
6
Gathering, analysis and visualization of
2
2
8
relational databases
4
4
11
8
Distributed data processing
4
4
16
9
R language sources for big data visualization
2
2
10
analysis of social network data. Facebook
4
Social media management. Gathering and
geospatial data
7
Properties of data storing in relational and non-
Total
144
24
24
96
3
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
6. Forms of students knowledge control
Type of
control
Current
(week)
Total
(week)
1st year
Form of control
Practical tasks
1
1
Exam
Parameters
2
Practical tasks, result evaluation – 1 week
1
Oral exam, 20 min per student
6.1 Criteria for assessing the knowledge, skills
The student should demonstrate the knowledge of sections of the discipline and the ability to
present the results of homework and tests in accordance with the required competencies.
Evaluation of all forms of monitoring are set on a 10-point scale.
On the final evaluation on a subject matter consists of ratings for:
 Practical work – O1
 exam – O2
according to the formula: Oi = 0.5*O1+ 0.5* O2
7. Program content
Topic 1. Introduction to the analysis and management of big data
What is data management? Sections of data management. The architecture, design and analysis
of data. Management of databases. Security management data. Data quality management. The value of
data management in business intelligence. Content management. Management of meta data.
Management and analysis of media data.
Basic literature
1. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
2. Cotton R. Learning R. O’Reilly Media, 2013
3. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
4. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
5. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
4
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
Topic 2. Data Management
What is cloud computing? Power of cloud data storage and processing. Cloud computing. SaaS,
PaaS, IaaS. The possibility of the R language as applied to cloud services Amazon, Google and IBM.
Basic literature
6. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
7. Cotton R. Learning R. O’Reilly Media, 2013
8. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
9. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
10. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 3. Model of distributed file systems and databases computing
What is SMM? Why is the management of social networks for the modern enterprise? Types of
data from social networks. Processing techniques for social media data. Features API Facebook.
Loading, storage and analysis of Facebook data using the language R.
Basic literature
11. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
12. Cotton R. Learning R. O’Reilly Media, 2013
13. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
14. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
15. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 4. Search for similarities in the data
The importance of data from the social network Twitter for business. Twitter and SMM.
Features of API Twitter. Loading, storage and analysis of Twitter data using language R.
Basic literature
5
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
16. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
17. Cotton R. Learning R. O’Reilly Media, 2013
18. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
19. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
20. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 5. Analysis of streaming data.
What is Web scrapping? Google Analytics and other means of collecting web data. Types of
data received from the network. Means of the R language for web scrapping.
Basic literature
21. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
22. Cotton R. Learning R. O’Reilly Media, 2013
23. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
24. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
25. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 6. Link analysis.
What is geospatial data? The main sources of geospatial data. Geospatial data in social
networks and their role in business development. The use of geospatial data for the formation of social
network graphs. Means of the R language for data collection, analysis and visualization of geospatial
data.
Basic literature
26. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
27. Cotton R. Learning R. O’Reilly Media, 2013
28. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
6
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
29. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
30. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 7. Frequent datasets analysis
Features of data storing in relational and non-relational databases. Justification of the choice of
storage based on architecture and enterprise features. MySQL and NoSQL: similarities and
fundamental differences. Means of the R language for working with MySQL and NoSQL.
Basic literature
31. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
32. Cotton R. Learning R. O’Reilly Media, 2013
33. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
34. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
35. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Topic 8. Clustering algorithms and their applications
The need for distributed processing of large data. MapReduce and Hadoop. Diagram of a
distributed data processing. Means of the R language for parallel computing and data processing using
Hadoop.
Basic literature
36. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
37. Cotton R. Learning R. O’Reilly Media, 2013
38. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
39. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
40. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
7
National research university 'Higher school of economics'
Discipline program "Advanced methods of data analysis and big data in business intelligence" to
guide 080500.68 "Business Informatics" Master training
Topic 9. Neural networks and their applications
Features of visualization of big data. Types and methods of visualization of structured, semistructured and unstructured data. Means of the R language for visualizing big data.
Basic literature
41. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
42. Cotton R. Learning R. O’Reilly Media, 2013
43. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
44. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
45. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
8. Literature
Basic literature
46. A Ohri, R for Cloud Computing. An Approach for Data Scientist. Springer Science, New
York, 2014
47. Cotton R. Learning R. O’Reilly Media, 2013
48. Venables W. N., Smith D. M. and the R Core Team. An introduction to R. 2015
Additional literature
49. Abiteboul S., Manolescu I., Rigaux P., Rousset M. C., Sennelart P. Web Data Management.
Cambridge University Press, 2011
50. Leskovec J., Rajaraman A., Jeffrey D. Ullman. Mining of Massive Datasets. Stanford
University, 2010
Developers:
NRU-HSE________ _______professor________ _____Nikolay V. Markov
(workplace)
(position)
(инициалы, фамилия)
8
Download