Facultatea de Științe Economice și Gestiunea Afacerilor Str. Teodor Mihali nr. 58-60 Cluj-Napoca, RO-400951 Tel.: 0264-41.86.52-5 Fax: 0264-41.25.70 econ@econ.ubbcluj.ro www.econ.ubbcluj.ro DETAILED SYLLABUS Big Data and Web Computing 1. Information about the study program 1.1 University 1.2 Faculty 1.3 Department 1.4 Field of study 1.5 Program level (bachelor or master) Babeş-Bolyai University Faculty of Economics and Business Administration Business Information Systems Business Information Systems Master 1.6 Study program / Qualification Business Modeling and Distributed Computing 2. Information about the subject 2.1 Subject title Big Data and Web Computing 2.2 Course activities professor Assoc. prof. Ioan Petri 2.3 Seminar activities professor Assoc. prof. Ioan Petri 2.4 Year of study I 2.5 Semester II 2.6 Type of assessment Colloquium 2.7 Subject regime mandatory 3. Total estimated time (teaching hours per semester) 3.1 Number of hours per week 4 out of which: 3.2 course 2 3.3 seminar/laboratory 3.4 Total number of hours in the 56 out of which: 3.5 course 28 3.6 seminar/laboratory curriculum Time distribution Study based on textbook, course support, references and notes Additional documentation in the library, through specialized databases and field activities Preparing seminars/laboratories, essays, portfolios and reports Tutoring Assessment (examinations) Others activities ................................... 3.7 Total hours for individual study 119 3.8 Total hours per semester 175 3.9 Number of credits 7 2 28 Hours 35 35 35 10 4 4. Preconditions (if necessary) 4.1 Curriculum 4.2 Skills Distributed systems Moderate programming skills (java or other object oriented language) 5. Conditions (if necessary) 5.1. For course development 5.2. For seminar / laboratory development Projector Access to a large scale computing infrastructure 1 NOTE: This document represents an informal translation performed by the faculty. 6. Acquired specific competences Professional competences Transversal competences • • • • • The course should allow the student to understand, use, and build practical big data analytics an management systems; acknowledge the role of different operators used; The course is intended to provide a basic understanding of the issues and problems involved in massive on-line repository systems, a knowledge of currently practical techniques for satisfying the needs of such a system Implement heuristics adapted for specific problems Indication of the current research approaches that are likely to provide a basis for tomorrow's solutions Analogies with existing technologies and their implications in the societal developments Determine the impact of big data and web computing. Identify the economic implications of big data Associate the big data and web computing with social networks Identify the major researching topics in the field of big data analysis 7. Subject objectives (arising from the acquired specific competences) 7.1 Subject’s general objective 7.2 Specific objectives Building on the concept of distributed computing and understand the repercussions of large scale computing infrastructures in terms of processing, storage and analysis. Testing the various analytics algorithms and determine the benefits such as recommendation, classifications, etc. Adapting the knowledge to modern computing technologies and identifying the economical options and business models of big data. Presenting a more advanced perspective that sits beyond the technological implications of modern technologies such as social influence, web collaboration and data analytics. 8. Contents 8.1 Course 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. Introduction to Big Data and Web Computing Technologies & Techniques for Big Data and Web Computing Recommendation algorithms Clustering algorithms Classification algorithms Graph computing: Graph Theory and Groups Graph analytics in Social Networks Social Computing and networks analytics Mobile Data Collection, Analysis, and Interface Web and Future Internet Business Models for Big Data Advanced Big Data Analytics Teaching methods Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Lectures/examples Observations 1 lecture 1 lecture 1 lecture 1 lecture 1 lecture 2 lectures 1 lecture 2 lecture 1 lecture 1 lecture 1 lecture 1 lecture 2 NOTE: This document represents an informal translation performed by the faculty. References: 1. Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce, Morgan & Claypool Publishers, 2010. http://lintool.github.com/MapReduceAlgorithms/ [Mandatory] 2. Graph Theory and Complex Networks” by Maarten Van Steen, 2010. [Mandatory] 3. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Introduction to Data Mining, Addison-Wesley April 2005. [Mandatory] 4. Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, Cambridge Press, http://infolab.stanford.edu/~ullman/mmds/book.pdf [Mandatory] 5. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, The Morgan Kaufmann Series in Data Management Systems, Jim Gray, Series Editor Morgan Kaufmann Publishers, August 2000. 550 pages. ISBN 1-55860-489-8. [Optional] 6. Social Network Analysis for Startups: Finding connections on the social web” by Maksim Tsvetovat and Alexander Kouznetsov, O’Reilly Media, 2007. [Optional] 8.2 Seminar/laboratory Teaching methods Observations Installation of the developing environments –Hadoop and Examples/exercices 2 laboratories PeerSim 2. Simulation of the P2P community Examples/exercices 1 laboratory 3. Testing different network topologies and data analysis Examples/exercices 2 laboratories 4. Algorithms implementations Examples/exercices 3 laboratories 5. Applying graph analytics Examples/exercices 2 laboratories 6. Hubs, Centrality, Connectivity Examples/exercices 2 laboratories 7. Analysis APIs Examples/exercices 2 laboratories 8. Costs management for big data Examples/exercices 1 laboratory References: 1. Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: A New Data Clustering Algorithm and Its Applications, Data Mining and Knowledge Discovery, Volume 1, Issue 2, 1997, 141182. 2. Indranil Palit and Chandan K. Reddy, "Scalable and Parallel Boosting with MapReduce", IEEE Transactions on Knowledge and Data Engineering (TKDE), Vol.24, No.10, pp.1904-1916, October 2012. Yang, X.S., Nature Inspired Meta-heuristic Algorithms, Luniver Press, 2010. 3. Trevor Hastie, Robert Tibshirani, Jerome. H. Friedman. The elements of statistical learning: data mining, inference and prediction. Springer, 2009 4. T. L. Griffiths and M. Steyvers. Finding scientific topics. In Proceedings of the National Academy of Sciences, 101, 5228-5235, 2004 1. 9. Corroboration / validation of the subject’s content in relation to the expectations coming from representatives of the epistemic community, of the professional associations and of the representative employers in the program’s field. In many areas and domains, data are generated at a phenomenal speed that we have never experienced before. Given the large amount of data, one fundamental scientific challenge is how to develop efficient and effective computational tools to analyze the data, revealing insight and make predictions. Data analytics is the science of achieving these goals. 10. Assessment (examination) Type of activity 10.1 Assessment criteria 10.4 Course Understand big data and distinguish the various analysis algorithms 10.2 Assessment methods Written exam 10.5 Setting up the testing environments for data Group project containing: Seminar/laboratory collection and implementing the analytics Environment algorithms configuration System design and network architecture Algorithms implementations 10.3 Weight in the final grade 0.4 0.6 3 NOTE: This document represents an informal translation performed by the faculty. 10.6 Minimum performance standard • Students must demonstrate involvement and interest both in the lecturing activity and to laboratory exercises. • A minimum of grade 5 on both assessment methods is required • Students must comply with the project requirements Date of filling 26 ianuarie 2015 Signature of the course professor Assoc. prof. Ioan Petri Date of approval by the department 28 ianuarie 2015 Signature of the seminar professor Assoc. prof. Ioan Petri Head of department’s signature prof.univ.dr. Gheorghe Cosmin Silaghi 4 NOTE: This document represents an informal translation performed by the faculty.