Федеральное государственное автономное образовательное учреждение высшего профессионального образования "Национальный исследовательский университет "Высшая школа экономики" Факультет компьютерных наук Департамент программной инженерии Рабочая программа дисциплины Базы данных (на английском языке) для образовательной программы «Программная инженерия» направления подготовки 09.03.04 «Программная инженерия» уровень - бакалавр Разработчики программы доцент, к.т.н. Брейман А.Д. abreyman@hse.ru Одобрена на заседании департамента программной инженерии «___»____________ 2015 г. Руководитель департамента Авдошин С.М. ________________ Рекомендована Академическим советом образовательной программы «___»____________ 2015 г., № протокола_________________ Утверждена «___»____________ 2015 г. Академический руководитель образовательной программы Шилов В.В. _________________ Москва, 2015 Настоящая программа не может быть использована другими подразделениями университета и другими вузами без разрешения подразделения-разработчика программы. Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра 1 Scope and Regulations The syllabus is prepared for teachers responsible for the course (or closely related disciplines), teaching assistants, students enrolled on the course as well as experts and statutory bodies carrying out assigned or regular accreditations in accordance with • educational standard of the federal state autonomous educational institution for higher professional education National Research University Higher School of Economics; • B.Sc. curriculum ("Software Engineering", area code 09.03.04) 3rd year, 2015-2016 academic year; • B.Sc. curriculum ("Software Engineering", area code 09.03.04) 4th year, 2015-2016 academic year; The course is offered to students of the Bachelor Program «Software Engineering» (code 09.03.04) at Faculty of Computer Science of the National Research University Higher School of Economics (HSE). This mandatory course belongs to professional cycle curricula unit (Б.3.Б unit/ Base module [Professional cycle disciplines Б.3] of 2015-2016 academic year's working syllabus) covered by the list of training courses of bachelor's program (3rd and 4th year of studies). It is a four module course, which is delivered in modules #3 and #4 of the third academic year and in modules #1 and #2 of the fourth academic year. Number of credits is 8. Total course length is 288 academic hours including 120 auditory hours (60 Lecture hours and 60 Practice hours) and 168 Self-study hours. Academic control forms are two home assignments in form of group projects each year, one written exam after module #4 of 3rd year, and one written exam after module #2 of 4th year. 2 Course Objectives The student should develop skills and understanding in: the design methodology for databases and verifying their structural correctness; implementing databases and applications software in the relational model as well as in map/reduce paradigm; using querying languages, primarily SQL, and other database supporting software; applying the theory behind various database models and query languages; implementing security and integrity policies relating to databases; working in group settings to design and implement database projects. The objective of first part (being taught on 3rd year) of the course is to expose to students topics of (mostly) online transactional processing and relational database theory, database design and implementation, including entity-relationship data modeling, relational model, algebra and calculus, functional dependencies and normalization theory, relational query languages, including SQL. The objective of second part (being taught on 4th year) of the course is to form professional competencies related to design and implementation of other kinds of databases, including data warehouses, online analytical data processing and big data management tools. Students will get a grasp on strengths and weaknesses of wide spectrum of approaches to data storage, search and retrieval, resulting in informed choice of database model. This course studies different conceptual database models and their properties. For these conceptual models the course will concentrate on the following points: Why was the database model introduced? Which of the shortcomings of other models does it address? What are the most important concepts and notions for the database model? How is the model implemented? Which are the main techniques? The importance of understanding the internals of a particular database model cannot be overemphasized as it is closely connected to its limitations. Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра 3 Learning Outcomes (Competencies to be formed) After taking this course the student should have achieved the following objectives: Knows the data modelling concepts and has an understanding of relational data model. Can compose queries to relational databases using relational algebra, tuple relational calculus and SQL. Knows methods of database design, including entity-relationship approach and normalizationbased approach. Knows and is able to apply database application design and development methods usable for object-oriented program systems, including object-relational mappers, its advantages and disadvantages. Knows models and methods of internal organization of relational databases including file storage, indexing, query processing and transaction management issues. Students should be able to understand the language of studies models, choose and use appropriate models and programming languages, implement systems using chosen models, methods and tools. The course contributes to the development of the following competencies: Competency Code Descriptors Professional, Scientific research activities ПК-1 Professional, Scientific research activities ПК-2 The ability to apply main concepts, principles, theories and facts, connected to computer science, in the process of scientific research and problem solving. The ability to formalize in its subject area within the constraints of methods of research ПК-3 Preparedness to use research techniques and tools to study objects of professional activity Professional, Analytical activities ПК-6 Professional, Project activities ПК-10 The ability to formalize the subject area of a software project and to develop specifications the software product components The ability to design, develop and test software products. Professional, Project activities ПК-11 Professional, Project activities ПК-12 Professional, Technology-oriented activities ПК-15 Skills in using the operating systems, Professional, Scientific research activities The ability to read, understand, and extract the main idea from source code and documentation Skills in modeling, analysis and use of formal methods of software network technologies, software interface development tools, languages and methods of formal specifications, database management systems 3 Education forms and methods for competency formation Lectures, Essay composing, Home assignments, Self-study Lectures, Practical studies in data modeling, Home assignments (data modeling part) Lectures, Practical studies in data model assessment and evaluation, Home assignments (solutions evalution part) Lectures, Practical studies, Home assignments Lectures, Practical studies, Home assignments Practical studies, Industrial cases reviews Lectures, Practical studies, Home assignments Lectures, Practical studies, Home assignments Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Competency Professional, Technology-oriented activities Professional, Technology-oriented activities Professional, Technology-oriented activities 4 Code Descriptors ПК-16 Skills in using different software development technologies ПК-17 The ability to apply core software development methods and tools ПК-19 The ability to understand life cycle standards and models. Education forms and methods for competency formation Lectures, Practical studies, Home assignments Lectures, Practical studies, Home assignments Lectures, Practical studies, Home assignments The Course Position in the Curriculum This mandatory course belongs to professional cycle curricula unit (Б.3 unit/ Base module [Professional cycle disciplines 1.3] of 2015-2016 academic year's working syllabus) covered by the list of training courses of bachelor's program (3rd year of studies). Prerequisite courses: Computer science (Informatics), Mathematical Logic and Theory of Algorithms, Discrete Mathematics, Programming, Software Construction, Algorithms Design and Analysis. The course is based on the knowledge of foundations of discrete mathematics (including logic and set theory), computer science, and computer programming. Students are assumed to understand basic data structures and their uses (lists, arrays), concepts of object-oriented programming (classes, inheritance), modular software design and method linkage, the design, implementation and testing of medium-sized problem solutions, to be familiar with some contexts in which databases are used. Courses that are based on this course: Information Systems, Program projects management, Group project on Software Engineering, Joint year projects, Final projects. 5 № Course Plan Topic title Total hours Classroom hours Lectur Semin Practice es ars Self-study Module #3, 3rd year 1 2 3 4 5 6 7 Introduction. Data modeling. Relational Model. Relational Query Languages. SQL. Database Design: The E/R and UML Approaches. Relational Database Design. Module #3 totals 10 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2 4 14 8 14 72 2 16 2 14 10 42 20 8 8 12 12 4 2 2 2 2 4 2 2 2 4 12 4 4 8 6 6 6 6 8 22 Module #4, 3rd year 8 9 10 11 12 Application Design and Development. Storage and File Structure. Indexing and Hashing. Query Processing. Transaction Management. 4 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Distributed and Parallel Databases. Module #4 totals 3rd year totals 13 12 72 144 2 14 30 2 16 30 8 42 84 10 22 20 20 72 2 4 4 4 14 2 6 4 4 16 6 12 12 12 42 34 14 12 12 72 144 288 8 4 2 2 16 30 60 8 2 2 2 14 30 60 18 8 8 8 42 84 168 Module #1, 4rd year Data warehousing and Big Data Management Data Warehousing Architectures and Models Data Cleaning And Integration Key/Value and Document Databases Module #1 totals 14 15 16 17 Module #2, 4rd year Map/Reduce and Hadoop Large-scale Distributed Databases In-memory Databases Data Streams Management Module #2 totals 4th year totals TOTAL 18 19 20 21 6 Assessment Type Form Current (week) Test Quiz 3rd year 1 2 3 4 7 * * Essay Interim Final 6.1 4 Homework presentation Exam Exam 4 4th year 1 2 3 4 7 * * 4 * 4 * * * Dept Parameters ** Written test, 30 min. Each week, first 10 minutes of lecture Written essay, up to 2 pages, homework Group presentation of home assignment followed by demonstration Written exam, 90 min. Written exam, 90 min. Guidelines for Knowledge Assessment Home assignment 1 (HA1) has to be prepared in modules 3 and 4 of 3rd year by students in groups of up to 5 and includes design, implementation and testing of a database and database application for given subject area (chosen by group and approved by instructor or assigned by instructor). Results of home assignment 1 should be presented in form of report that consists of design document, implementation description, results of testing. Mandatory appendixes are source code for application and database creation script. Report should be submitted to LMS not later that for 7 calendar days before assigned date of its presentation (on the last week of 4th module). Project should be presented and demonstrated by all group members. Each group member should demonstrate complete understanding of all project details and give correct answers to at least two questions of instructor. Written test at the end of the first module (last week of module 3 of 3rd year of study) implies arrangement of the written test (in lecture room) for all students enrolled to the course. Topics covered by the test embraces all course material of first module. Written exam at the end of the second module (module 4 of 3rd year of study) implies arrangement 5 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра of the written test (in lecture room) for all students enrolled to the course. Topics covered by the test embraces all course material of first two modules. Home assignment 2 (HA2) has to be prepared in modules 1 and 2 of 4th year by students in groups of up to 5. Students should identify a Big Data problem that they would like to work on and specify a difficult question to data and identify data sources that might help answering the question. Project result should contain two (possibly partial) solutions for identified problem: a) data warehouse-based solution including OLAP options and b) Hadoop, MongoDB or Cassandra-based solution. Built solutions should be compared, their relative strengths and weaknesses are revealed and described. Chosen Big Data problem should be approved by instructor. Results of HA2 should be presented in form of project report that consists of design document, implementation description, results of testing and comparing of solutions. Mandatory appendixes are source code for both applications and data managements scripts. Report should be submitted to LMS not later that for 7 calendar days before assigned date of its presentation (on the last week of module 2 of 4th year). Project should be presented and demonstrated by all group members. Each group member should demonstrate complete understanding of all project details and give correct answers to at least two questions of instructor. 6.2 Grading System Rounding procedure for grades (where applicable): up to an integer number of points. Practice activity during practice hours is assessed by evaluating of student involvement into discussions as well as quality of exercise performance during practice. Practice activity grades (Оclassroom 3rd year and Оclassroom 4th year ) use a ten-point scale. Students have to write an essay (on topic proposed by instructor at first lecture) once in each module (due 4th week of each module) with length of up to 2 pages. Grades Оessays 3rd year and Оessays 4th rd th year are an arithmetic averages of two 3 and two 4 year essays grades respectively (ten-point scale, rounding up to an integer number of points). Students has to answer quiz questions in first 10 minutes of each lecture. Grades Оquiz 3rd year and Оquiz 4th year are an arithmetic averages of quiz grades respectively (ten-point scale, rounding up to an integer number of points). Average is calculated by division of sum of all student’s quiz answers grades on a total quizzes count in a year. Students has to answer test questions on 7th week of 3rd module of 3rd year and on 7th week of 2nd module of 4th year. Grades Оtest 3rd year and Оtest 4th year for that tests are on ten-point scale. Value of Оhomework 3rd year (homework assignment 1) component of final grade formula is an integer value from interval [0,10] consists of the common score for the report and presentation (from 0 to 5; same score to all group members) and individual student score for the answers to the questions (from 0 to 5). If a student misses the project presentation because of some valid reason, s/he receives «absence» grade. If a student misses the project presentation because of any other reason, s/he receives grade based on individual score set to 0. Value of Оhomework 4th year (homework assignment 2) component of final grade formula is an integer value from interval [0,10] consists of the common score for the report and presentation (from 0 to 5; same score to all group members) and individual student score for the answers to the questions (from 0 to 5). If a student misses the project presentation because of some valid reason, s/he receives «absence» grade. If a student misses the project presentation because of any other reason, s/he receives grade based on individual score set to 0. Written test at the end of the first module (last week of module 3 of 3rd year of study) is assessed on usual ten-point scale. Interim written exam at the end of the second module (module 4 of 3rd year of study) Оinterim exam is assessed on usual ten-point scale. Final written exam at the end of the fourth module (module 2 of 4th year of study) Оfinal exam is as6 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра sessed on usual ten-point scale. Cumulative grade for student’s current results in 3rd year is calculated using the following formula: Оcumulative 3rd year = 0,7* Оcurrent 3rd year + 0,3* Оclassroom 3rd year where Оcurrent 3rd year = 0,2·Оessays 3rd year+0,2·Оtest 3rd year+0,2·Оquiz 3rd year+0,6·Оhomework 3rd year; Interim grade for 3rd year shoul be calculated according to the following formula: Оinterim 3rd year = 0,5·Оcurrent 3rd year + 0,5·Оinterim exam Cumulative grade for student’s current results in 4th year is calculated using the following formula: Оcumulative 4th year = 0,7* Оcurrent 4th year + 0,3* Оclassroom 4th year where Оcurrent 4th year = 0,2·Оessays 4th year+0,2·Оtest 4th year+0,2·Оquiz 4th year+0,6·Оhomework 4th year; Final cumulative grade for student is calculated using the following formula: Оcumulative final = (Оinterim 3rd year + Оcumulative 4th year )/2 Final grade for student is calculated using the following formula: Оfinal = 0,5 Оcumulative final + 0,5 Оfinal exam. 7 Detailed Course Contents Тopic 1: Introduction. ♦ Lectures: 2h. Practice: 2h. Self:study: 2h. ♦ Outline: Course overview and logistics. History of data management approaches. Basic database system concepts. Database environment. Database users. Database development process. Database planning. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.1] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.1] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 2: Data Modeling. ♦ Lectures: 2h. Practice: 2h. Self:study: 2h. ♦ Outline: Three level database architecture. Data model. Data independence. Inverted files. Early data models: hierarchical and network. Basic relational model concepts. Object-oriented model. Object-relational model. 7 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Semi-structured model. Semantic data models. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.1] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.1, 2.1] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 3: Relational Model. ♦ Lectures: 2h. Practice: 2h. Self:study: 2h. ♦ Outline: History of relational model. Advantages of relational model. Basic relational data structures. Mathematical and database relations. Relation schema. Relational database. Integrity constraints. Relation keys. Key constraint. Foreign key constraint. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.2] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.2.2] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 4: Relational Query Languages. ♦ Lectures: 2h. Practice: 2h. Self:study: 4h. ♦ Outline: Relational algebra. Relational algebra operations. Selection. Projection. Set operations. Renaming. Join, equijoin, antijoin, theta-join. Natural join. Division. Tuple relational calculus: atoms, formulas, queries. Domain relational calculus. ♦ Core books (sources of information) 8 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.6] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.5] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 5: Structured Query Language. ♦ Lectures: 4h. Practice: 2h. Self:study: 14h. ♦ Outline: SQL language history. SQL standards. Data definition and data manipulation (sub)languages. SQL data types. Table declaration. Primary keys, unique constraints, default values, nullable attributes. Check constraints. Foreign key constraints. Handling foreign key violations. Indexes. Database schema modifications. SQL query sublanguage: SELECT. Single-table queries. Filtering conditions. Logical operations IN, ALL, EXISTS. Join queries. Join types: cross, natural, inner, outer, self. Duplicates elimination. Set operations. Nested queries. Correlated nested queries. Aggregate functions. Grouping and group filtering. Query result sorting. INSERT. UPDATE . DELETE. Views: creation, use and updating. Triggers: creation, activation, execution. Multiple triggers. View materialization. SQL procedural extensions and dialects: T-SQL, PL/SQL, PgSQL. Stored procedures. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.3,4,5] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.6,7,8] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 6: Database Design: E/R and UML Approaches. ♦ Outline: 9 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Entity-relationship model. Entities and attributes. Entity types. Keys. E/R diagram. Relationships. Attributes and roles. Relationship type, degree and cardinality. Relationship participation constraints. Strong and weak entities. Entity type hierarchies. Specialization and generalization. Total and partial unions. Unified modeling language class diagram. Association and aggregation in UML. Generalization hierarchies in UML. Multiplicity indicators in UML. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.7] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.4] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 7: Relational Database Design. ♦ Lectures: 2h. Practice: 2h. Self:study: 8h. ♦ Outline: Objectives of normalization. Limitations of E/R design. Redundancy. Anomalies: insertion, deletion, update. Decomposition. Informal guidelines for relation design. Functional dependencies. Axioms of functional dependencies. Closure. Minimal cover of a set of dependencies. Desirable properties of decompositions: attributes preservation, dependency preservation, lossless join. First normal form. Full functional dependencies. Second normal form. Transitive dependency. Third normal form Boyce/Codd normal form (BCNF). Multivalued dependencies. Fourth normal form. 10 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Fifth normal form. Domain/Key normal form (DKNF). BCNF decomposition algorithm and its properties. Normalization drawbacks. Denormalization. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.8] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.3] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 8: Application Design and Development. ♦ Lectures: 2h. Practice: 2h. Self:study: 10h. ♦ Outline: Database access from programming languages. ODBC architecture. ODBC Drivers. ODBC connection strings. JDBC architecture. Connecting to DBMS. Preparing and executing queries. Using result sets and cursors. Handling exceptions. Transactions in JDBC. Object-relational mapping. Design patterns for data persistence. Active Record pattern. Data Mapper pattern. Hybernate. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.9] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.9] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 9: Storage and File Structure. ♦ Lectures: 2h. Practice: 2h. Self:study: 4h. ♦ Outline: Memory hierarchy. Disk storage: physical disk structure, pages and blocks I/O time: seek latency, transfer time. Disk cache. RAID. 11 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра SSD. SAN and NAS. Datafiles: blocks and extents. Block structure. Fixed and variable record formats. Large objects (LOBs). ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.10] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.13] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 10: Indexing and Hashing. ♦ Lectures: 2h. Practice: 2h. Self:study: 4h. ♦ Outline: Indexing concepts. B-tree index. B-tree insertions and deletions. Hash index. Hash functions. Extendible hashing. Bitmap index. Join index. GiST. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.11] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.14] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 11: Query processing. ♦ Lectures: 2h. Practice: 2h. Self:study: 8h. ♦ Outline: Query processing overview. Relational algebra translation. Query tree. Relational algebra equivalences. Heuristics for optimization. Cost-based optimization. Cost factors and estimation. External merge sort. Duplicate elimination. Implementing set operations. Sort-based and hash-based projection. 12 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Computing selection without indexes. Computing selection with clustered index. Computing selection with b-tree index. Computing selection with hash index. Computing joins: nested loops. Computing joins: block nested loops. Computing joins: sort-merge join. Computing joins: hash-join. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.12,13] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.15,16] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 12:Transaction management. ♦ Lectures: 2h. Practice: 4h. Self:study: 6h. ♦ Outline: Single-user and multi-user systems. Basic concepts of transactions. ACID properties. Isolation. Serial and interleaved execution. Schedules: serial, serializable. Methods to ensure serializability. Concurrency control. Optimistic and pessimistic concurrency. Two-phase locking. Locking and deadlocks. Implementing isolation levels with locks. Snapshot isolation. Timestamping. Atomicity and Durability. Write-ahead log. Redo and undo records. Recovery from crash. Checkpoints. Distributed transactions. Two-phase commit protocol. Replication. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.14,15,16] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.17,18,19] 13 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 13: Distributed and parallel databases. ♦ Lectures: 2h. Practice: 2h. Self:study: 8h. ♦ Outline: Distributed database systems: advantages and drawbacks. Distributed database architectures. Software components and functions of distributed DBMS. Data placement. Transaction management for distributed DBMS. Locking protocols. Timestamping protocols. Commit protocols. Distributed recovery from failures. Distributed query processing. Data integration. Parallel database systems. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.17,18,19] Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.20] Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. Тopic 14: Data Warehousing and Big Data Management. ♦ Lectures: 2h. Practice: 2h. Self:study: 6h. ♦ Outline: Role and purpose of a data warehouse. Components of a data warehouse. Data warehousing and OLAP. Semistructured data and XML. Big data concepts and notions. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGraw-Hill, 2010. — 1376pp. [Ch.20,23,26] Franks B. Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics (Wiley and SAS Business Series), Wiley, 2012. — 336pp. Тopic 15: Data Warehouse Architecture and Models. ♦ Lectures: 4h. Practice: 6h. Self:study: 12h. ♦ Outline: Requirements to data management from decision support systems. Historical, summarized, integrated data. Statistical and analytical queries. Business intelligence applications. 14 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Three-tier architecture. Data warehouse, data mart. Online analytical processing (OLAP). Conceptual models for decision support. Multidimensional view on the data. Cross-tabulation. Data cubes. Operations with data cubes: roll-up, drill-down, pivot, slice & dice, select. Attribute hierarchies. Types of hierarchies. Query languages for supporting OLAP. SQL extensions: Group by cube, group by rollup. Analytic functions. Rank functions, window functions, lag/lead functions. Multidimensional expressions (MDX). Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation. Multi-dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense dimensions. Indexing of dimensions: b-tree, bitmap, join indexes. Hybrid OLAP (HOLAP). Slowly changing dimensions. Temporal databases. Valid time, transaction time. Bitemporal data model. ♦ Core books (sources of information) Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGraw-Hill, 2010. — 1376pp. [Ch. 20] Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Osborne Media, 2009. — 480pp. Celko J. Joe Celko's Analytics and OLAP in SQL, Morgan Kaufmann, 2006. — 208pp. Smith B.C., Clay C.R. Microsoft SQL Server 2008 MDX Step by Step, Microsoft Press, 2009. — 400pp. Chaudhury S., Dayal U. An overview of data warehousing and OLAP technology. // SIGMOD Record, v.26 n.2, pp.507-508, 1997. Harinarayan V., Rajaraman A., and Ullman J. Implementing Data Cubes Efficiently. // In Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96), pp. 205-216, 1996. Jensen C., Pedersen T., Thomsen C. Multidimensional Databases and Data Warehousing, Morgan & Claypool Publishers, 2010. — 111p. Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.10.6,10.7] Rainardi V. Building a Data Warehouse: With Examples in SQL Server. Apress, 2008. — 541pp. Inmon W. H., Krishnan K. Building the Unstructured Data Warehouse: Architecture, Analysis, and Design, Technics Publications, 2011. — 216pp. Kimball R., Ross M. The Data Warehouse Toolkit, Wiley, 2002. — 447pp. Тopic 16: Data Cleaning and Integration. 15 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра ♦ Lectures: 4h. Practice: 4h. Self:study: 12h. ♦ Outline: Data integration issues and stages. ETL process design and implementation. Error handling in ETL process. Metadata and ontologies in data integration. Data structures for ETL. Extracting data from external sources. Data profiling. Data cleaning. Data quality dimensions, issues and constraints. Typical cleaning checks. Conforming data. Data enrichment. Entity resolution. Data transformation. Loading data into warehouse. Bulk load. ♦ Core books (sources of information) Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.21] Kimball R., Caserta J. The Data Warehouse ETL Toolkit, Wiley, 2004. — 491pp. Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies, McGraw-Hill Osborne Media, 2009. — 480pp. Rodrigues F., Coles M., Dye D. Pro SQL Server 2012 Integration Services, Apress, 2012. — 636 pp. Тopic 17: Key/Value and Document Databases. ♦ Lectures: 4h. Practice: 4h. Self:study: 12h. ♦ Outline: Key/value stores: Memcache, Berkeley DB, Membase, Riak. Extended key/value (data structures) store: Redis. Document databases: MongoDB. ♦ Core books (sources of information) Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp. Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, Pragmatic Bookshelf, 2012. — 352pp. Тopic 18: Map/Reduce and Hadoop. ♦ Lectures: 8h. Practice: 8h. Self:study: 18h. ♦ Outline: Large-scale massive data processing. Map/reduce paradigm. Hadoop map/reduce implementation. Hadoop distributed filesystem HDFS. Hadoop I/O. Developing hadoop applications. 16 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Managing job configuration. Mapper, Reducer, Combiner. Running job locally and on a cluster. Map/reduce design patterns. Pig. Pig Latin. Hive. HiveQL. HBase BigTable implementation. ♦ Core books (sources of information) White T. Hadoop: The Definitive Guide, 4th edition, O’Reilly, 2015. — 756pp. Holmes A. Hadoop In Practice, Manning Publications, 2012. — 511pp. Miner D., Shook A. MapReduce Design Patterns, O’Reilly, 2012. — 232pp. Gates A. Programming Pig, O’Reilly, 2011. — 203pp. Sitto, Kevin, Presser, Marshall (2015) Field Guide to Hadoop. – O’Reilly, 2015. – 118 pp. Gunarathne, Thilina (2015) Hadoop MapReduce v2 Cookbook, 2nd ed. – Packt, 2015. Karanth. Sandeep (2014) Mastering Hadoop. – Packt, 2014. – 351 pp Тopic 19: Large-scale Distributed Databases. ♦ Lectures: 4h. Practice: 2h. Self:study: 8h. ♦ Outline: Hashing for distributed data storage. Consistent hashing. Dynamic hash tables (DHT). Vector clocks and conflict detection. Gossip protocol and hinted handoff. Merkle trees. BigTable data model Cassandra. ♦ Core books (sources of information) Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp. Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, Pragmatic Bookshelf, 2012. — 352pp. Neeraj, Nishant. (2013) Mastering Apache Cassandra. – Packt Publishing, 2013. – 340pp. Bradberry, Russell, Lubow. Eric (2014) Practical Cassandra. A Developer’s Approach. – Addison Wesley, 2014. – 193 pp Тopic 20: In-memory Databases. ♦ Lectures: 2h. Practice: 2h. Self:study: 8h. ♦ Outline: In-memory databases principles. Data storage layout, encoding and compression. In-memory data manipulation operations. In-memory queries and tuple reconstruction. Implementations issues: differential buffer, indices, merge process, logging, recovery, workload management. 17 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Implications for application development. ♦ Core books (sources of information) Plattner H. (2013). A Course in In-Memory Data Management, Springer, 2013. – 298 pp. Тopic 21: Data Stream Management Systems. ♦ Lectures: 2h. Practice: 2h. Self:study: 8h. ♦ Outline: Data stream example applications. Data stream systems: STREAM, Aurora. Data stream models: relational and XML. Window-based processing. STREAM CQL. Query semantics. Mapping Streams to Relations and vice versa. Aurora SQuAL. Operators. Query processing. STREAM: architecture, query plan, optimizations. Aurora: architecture, optimizations. Distributed stream processing: Borealis. ♦ Core books (sources of information) 8 Golab L., Oszu M. T. Data Stream Management (Synthesis Lectures on Data Management), Morgan & Claypool Publishers, 2010. — 80 pp. Arasu A., Babu S., Widom J. The CQL Continuous Query Language: Semantic Foundations and Query Execution, VLDB Journal, 15(2), 2006. pp.121-142. Arasu A., Babcock B., Babu S., Cieslewicz J., Datar M., Ito K., Motwani R., Srivastava U., Widom J. STREAM: The Stanford Data Stream Management System. // M. Garofalakis, J. Gehrke, and R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams, Springer, 2009. Cherniack M., Zdonik S. Stream-Oriented Query Languages and Operators, Encyclopedia of Database Systems, Springer, 2009. pp 2848-2854. Methods of Instruction Course studies are organized in the form of lectures and practical studies. Besides traditional forms, some active and interactive forms are provided: discussion of real industry case studies; proposing and discussing group projects topics and its planned outcomes, using interactive simulators for database languages. 9 9.1 Assessment tools for students current evaluation and attestation Topics for assignments The course includes two home assignments (one in each year of study), compulsory to all students. Students will work in groups of up to 5 students on one of the suggested topics. First home assignment: Build a relational database application using the techniques studied in the course. Second home assignments: Student should identify a Big Data problem that s/he would like to work on and specify the following: Ask a difficult question. 18 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Identify data sources that you believe might help answering the question. Build a data warehouse aimed at answering the question. Build OLAP solution aimed at answering the question. Build Hadoop (or NoSQL) solution aimed at answering the question. Compare built solutions, describe relative strengths and weaknesses. 9.2 Topics for course final assessment Basic database system concepts. Database environment. Database planning. Three level database architecture. Basic relational model concepts. Object-oriented model. Object-relational model. Semi-structured model. Mathematical and database relations. Relation schema. Relational database. Integrity constraints. Relation keys. Key constraint. Foreign key constraint. Relational algebra operations. Selection. Projection. Relational algebra operations. Set operations. Relational algebra operations. Join, equijoin, antijoin, theta-join. Natural join. Relational algebra operations. Division. Tuple relational calculus: atoms, formulas, queries. SQL data types. SQL. Table declaration. Primary keys, unique constraints, default values, nullable attributes. SQL. Check constraints. SQL. Foreign key constraints. Handling foreign key violations. SQL. Database schema modifications. SQL. Single-table queries. Filtering conditions. Logical operations IN, ALL, EXISTS. SQL. Join queries. Join types: cross, natural, inner, outer, self. SQL. Duplicates elimination. SQL. Set operations. SQL. Nested queries. Correlated nested queries. SQL. Aggregate functions. SQL. Grouping and group filtering. SQL. Query result sorting. SQL. INSERT. SQL. UPDATE . SQL. DELETE. SQL. Views: creation, use and updating. SQL. Triggers: creation, activation, execution. Multiple triggers. SQL. View materialization. Stored procedures. Entity-relationship model. Entities and attributes. Entity types. Keys. Entity-relationship model. Relationships. Attributes and roles. Relationship type, degree and cardi- 19 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра nality. Relationship participation constraints. Entity type hierarchies. Specialization and generalization. Total and partial unions. Unified modeling language class diagram. Association and aggregation in UML. Generalization hierarchies in UML. Multiplicity indicators in UML. Objectives of normalization. Limitations of E/R design. Data redundancy. Anomalies: insertion, deletion, update. Functional dependencies. Axioms of functional dependencies. Closure.Minimal cover of a set of dependencies. Desirable properties of decompositions: attributes preservation, dependency preservation, lossless join. First normal form. Full functional dependencies. Second normal form. Transitive dependency. Third normal form. Boyce/Codd normal form (BCNF). Multivalued dependencies. Fourth normal form. Fifth normal form. Domain/Key normal form (DKNF). BCNF decomposition algorithm and its properties. Normalization drawbacks. JDBC architecture, connecting to DBMS, preparing and executing queries, using result sets and cursors. Handling exceptions in JDBC. Transactions in JDBC. Object-relational mapping. Design patterns for data persistence. Active Record pattern. Design patterns for data persistence. Data Mapper pattern. Hibernate. Datafiles: blocks and extents. Block structure. Fixed and variable record formats. Large objects (LOBs). B-tree index. B-tree insertions and deletions. Hash index. Hash functions.Extendible hashing. Bitmap index. Join index. GiST. Relational algebra translation. Query tree. Relational algebra equivalences. Cost-based optimization. Cost factors and estimation. External merge sort. Duplicate elimination. Implementing set operations. Sort-based and hash-based projection. Computing selection without indexes. Computing selection with clustered index. Computing selection with b-tree index. Computing selection with hash index. Computing joins: nested loops. 20 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Computing joins: block nested loops. Computing joins: sort-merge join. Computing joins: hash-join. Basic concepts of transactions. ACID properties. Schedules: serial, serializable. Methods to ensure serializability. Optimistic and pessimistic concurrency. Two-phase locking. Locking and deadlocks. Implementing isolation levels with locks. Snapshot isolation. Write-ahead log. Redo and undo records. Recovery from crash. Distributed transactions. Two-phase commit protocol. Distributed database architectures. Software components and functions of distributed DBMS. Transaction management for distributed DBMS. Distributed recovery from failures. Distributed query processing. Data integration. Parallel database systems. Object-relational mapper. Requirements to data management from decision support systems. Extract-transform-load process. Conceptual models for decision support. Multidimensional view on the data. Operations with data cubes: roll-up, drill-down, pivot, slice & dice, select. Query languages for supporting OLAP. SQL extensions: Group by cube, group by rollup. Multidimensional expressions (MDX). View materialization: optimal set of views, partial order on views, cost model, greedy algorithm. Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation. Multi-dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense dimensions. Indexing of dimensions: bitmap indexes. Indexing of dimensions: join indexes. Hybrid OLAP (HOLAP). ETL process design and implementation. Data cleaning. Data quality dimensions, issues and constraints. Typical cleaning checks. Entity resolution. Data transformation. Loading data into warehouse. Bulk load. Map/reduce paradigm. Hadoop map/reduce implementation. Hadoop distributed filesystem HDFS. Map/reduce design patterns. Pig. Pig Latin. Hive. HiveQL. HBase BigTable implementation. 21 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Dynamic hash tables (DHT). Vector clocks and conflict detection. Gossip protocol and hinted handoff. Key/value stores: properties and usage. Extended key/value (data structures) store Redis: properties and usage. Document database MongoDB: properties and usage. Large-scale distributed databases: Cassandra. In-memory databases principles. Data storage layout, encoding and compression. In-memory data manipulation operations. In-memory queries and tuple reconstruction. Data stream models: relational and XML. Window-based processing. STREAM CQL. Query semantics. Mapping Streams to Relations and vice versa. Aurora SQuAL. Operators. STREAM: architecture, query plan, optimizations. Aurora: architecture, optimizations. Distributed stream processing: Borealis. 10 Learning resources 10.1 Core Textbooks Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) (2010) Database System Concepts, 6th ed, McGraw-Hill, 2010. — 1376pp. Garcia-Molina, H., Ullman, J., Widom, J. (2009) (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. Elmasri, R., Navathe, S.B. (2010)(2010) Fundamentals of Database Systems, 6th ed., Addison Wesley, 2010. — 1200 pp. 10.2 Recommended books Blaha Michael (2010) Patterns of Data Modeling (Emerging Directions in Database Systems and Applications), CRC Press, 2010. — 261pp. Kuate P.H. et al. (2009) NHibernate in Action, Manning Publications, 2009. — 400pp. Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies, McGrawHill Osborne Media, 2009. — 480pp. Celko J. Joe Celko's Analytics and OLAP in SQL, Morgan Kaufmann, 2006. — 208pp. Kimball R., Ross M. The Data Warehouse Toolkit, Wiley, 2002. — 447pp. Kimball R., Caserta J. The Data Warehouse ETL Toolkit, Wiley, 2004. — 491pp. Smith B.C., Clay C.R. Microsoft SQL Server 2008 MDX Step by Step, Microsoft Press, 2009. — 400pp. Rodrigues F., Coles M., Dye D. Pro SQL Server 2012 Integration Services, Apress, 2012. — 636 pp. Ben-Gan I. et al. Inside Microsoft SQL Server 2008: T-SQL Programming, Microsoft Press, 2009. — 832 pp. 22 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра Melton J., Buxton S. Querying XML: XQuery, XPath, and SQL/XML in context, Morgann Kaufmann, 2006. 848pp. Franks B. Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics (Wiley and SAS Business Series), Wiley, 2012. — 336pp. White T. Hadoop: The Definitive Guide, 4th edition, O’Reilly, 2015. — 768pp. Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement, Pragmatic Bookshelf, 2012. — 352pp. Robinson I. Graph Databases, O’Reilly, 2013. — 224pp. Golab L., Oszu M. T. Data Stream Management (Synthesis Lectures on Data Management), Morgan & Claypool Publishers, 2010. — 80 pp. 10.3 Additional books Chaudhury S., Dayal U. An overview of data warehousing and OLAP technology. // SIGMOD Record, v.26 n.2, pp.507-508, 1997. Harinarayan V., Rajaraman A., and Ullman J. Implementing Data Cubes Efficiently. // In Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96), pp. 205216, 1996. Jensen C., Pedersen T., Thomsen C. Multidimensional Databases and Data Warehousing, Morgan & Claypool Publishers, 2010. — 111p. Rainardi V. Building a Data Warehouse: With Examples in SQL Server. Apress, 2008. — 541pp. Inmon W. H., Krishnan K. Building the Unstructured Data Warehouse: Architecture, Analysis, and Design, Technics Publications, 2011. — 216pp. Holmes A. Hadoop In Practice, Manning Publications, 2012. — 511pp. Miner D., Shook A. MapReduce Design Patterns, O’Reilly, 2012. — 232pp. Gates A. Programming Pig, O’Reilly, 2011. — 203pp. Arasu A., Babu S., Widom J. The CQL Continuous Query Language: Semantic Foundations and Query Execution, VLDB Journal, 15(2), 2006. pp.121-142. Arasu A., Babcock B., Babu S., Cieslewicz J., Datar M., Ito K., Motwani R., Srivastava U., Widom J. STREAM: The Stanford Data Stream Management System. // M. Garofalakis, J. Gehrke, and R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams, Springer, 2009. Cherniack M., Zdonik S. Stream-Oriented Query Languages and Operators, Encyclopedia of Database Systems, Springer, 2009. pp 2848-2854. Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp. Neeraj, Nishant. (2013) Mastering Apache Cassandra. – Packt Publishing, 2013. – 340pp. Bradberry, Russell, Lubow. Eric (2014) Practical Cassandra. A Developer’s Approach. – Addison Wesley, 2014. – 193 pp Plattner H. (2013). A Course in In-Memory Data Management, Springer, 2013. – 298 pp. Sitto, Kevin, Presser, Marshall (2015) Field Guide to Hadoop. – O’Reilly, 2015. – 118 pp. Gunarathne, Thilina (2015) Hadoop MapReduce v2 Cookbook, 2nd ed. – Packt, 2015. Karanth. Sandeep (2014) Mastering Hadoop. – Packt, 2014. – 351 pp 23 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра 10.4 Справочники, словари, энциклопедии MSDN. Available at: http://msdn.microsoft.com RSDN. Databases. Available at: http://rsdn.ru/summary/248.xml CITForum. Available at: http://www.citforum.ru/database SQL.RU Available at: http://sql.ru Liu L., Özsu M.T. (2009) Encyclopedia of Database Systems. Springer, 2009. — 748 pp. 10.5 Software tools Microsoft SQL Server 2008 R2 (or later) Microsoft SQL Server Analysis Services 2008 R2 (or later) Microsoft Visual Studio 2008-2010 (or later) Apache Hadoop 2 Apache Pig Apache Hive Apache Cassandra MongoDB 10.6 Remote course support LMS is used for remote course support. 11 Special Equipment Projector for lectures and practical studies. Computer classes having Microsoft Visual Studio 2010 (or later) and Microsoft SQL Server 2008 (or later) Management Studio installed 24 Национальный исследовательский университет «Высшая школа экономики» Программа дисциплины Базы данных для направления 09.03.04 «Программная инженерия» подготовки бакалавра 25