Module #4, 3 rd year - Высшая школа экономики

advertisement
Федеральное государственное автономное образовательное учреждение
высшего профессионального образования
"Национальный исследовательский университет
"Высшая школа экономики"
Факультет компьютерных наук
Департамент программной инженерии
Рабочая программа дисциплины
Базы данных
(на английском языке)
для образовательной программы «Программная инженерия»
направления подготовки 09.03.04 «Программная инженерия»
уровень - бакалавр
Разработчики программы
доцент, к.т.н. Брейман А.Д. abreyman@hse.ru
Одобрена на заседании департамента программной инженерии «___»____________ 2015 г.
Руководитель департамента Авдошин С.М. ________________
Рекомендована Академическим советом образовательной программы
«___»____________ 2015 г., № протокола_________________
Утверждена «___»____________ 2015 г.
Академический руководитель образовательной программы
Шилов В.В. _________________
Москва, 2015
Настоящая программа не может быть использована другими подразделениями университета
и другими вузами без разрешения подразделения-разработчика программы.
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
1
Scope and Regulations
The syllabus is prepared for teachers responsible for the course (or closely related disciplines),
teaching assistants, students enrolled on the course as well as experts and statutory bodies carrying
out assigned or regular accreditations in accordance with
• educational standard of the federal state autonomous educational institution for higher professional education National Research University Higher School of Economics;
• B.Sc. curriculum ("Software Engineering", area code 09.03.04) 3rd year, 2015-2016 academic
year;
• B.Sc. curriculum ("Software Engineering", area code 09.03.04) 4th year, 2015-2016 academic
year;
The course is offered to students of the Bachelor Program «Software Engineering» (code 09.03.04)
at Faculty of Computer Science of the National Research University Higher School of Economics
(HSE).
This mandatory course belongs to professional cycle curricula unit (Б.3.Б unit/ Base module [Professional cycle disciplines Б.3] of 2015-2016 academic year's working syllabus) covered by the list
of training courses of bachelor's program (3rd and 4th year of studies).
It is a four module course, which is delivered in modules #3 and #4 of the third academic year and
in modules #1 and #2 of the fourth academic year. Number of credits is 8. Total course length is
288 academic hours including 120 auditory hours (60 Lecture hours and 60 Practice hours) and
168 Self-study hours.
Academic control forms are two home assignments in form of group projects each year, one written exam after module #4 of 3rd year, and one written exam after module #2 of 4th year.
2
Course Objectives
The student should develop skills and understanding in:
 the design methodology for databases and verifying their structural correctness;
 implementing databases and applications software in the relational model as well as in
map/reduce paradigm;
 using querying languages, primarily SQL, and other database supporting software;
 applying the theory behind various database models and query languages;
 implementing security and integrity policies relating to databases;
 working in group settings to design and implement database projects.
The objective of first part (being taught on 3rd year) of the course is to expose to students topics of
(mostly) online transactional processing and relational database theory, database design and implementation, including entity-relationship data modeling, relational model, algebra and calculus,
functional dependencies and normalization theory, relational query languages, including SQL.
The objective of second part (being taught on 4th year) of the course is to form professional competencies related to design and implementation of other kinds of databases, including data warehouses, online analytical data processing and big data management tools. Students will get a grasp
on strengths and weaknesses of wide spectrum of approaches to data storage, search and retrieval,
resulting in informed choice of database model.
This course studies different conceptual database models and their properties. For these conceptual
models the course will concentrate on the following points: Why was the database model introduced? Which of the shortcomings of other models does it address? What are the most important
concepts and notions for the database model? How is the model implemented? Which are the main
techniques? The importance of understanding the internals of a particular database model cannot
be overemphasized as it is closely connected to its limitations.
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
3
Learning Outcomes (Competencies to be formed)
After taking this course the student should have achieved the following objectives:
Knows the data modelling concepts and has an understanding of relational data model. Can compose queries to relational databases using relational algebra, tuple relational calculus and SQL.
Knows methods of database design, including entity-relationship approach and normalizationbased approach.
Knows and is able to apply database application design and development methods usable for object-oriented program systems, including object-relational mappers, its advantages and disadvantages.
Knows models and methods of internal organization of relational databases including file storage,
indexing, query processing and transaction management issues.
Students should be able to understand the language of studies models, choose and use appropriate
models and programming languages, implement systems using chosen models, methods and tools.
The course contributes to the development of the following competencies:
Competency
Code
Descriptors
Professional,
Scientific research activities
ПК-1
Professional,
Scientific research activities
ПК-2
The ability to apply main concepts,
principles, theories and facts, connected to computer science, in the
process of scientific research and
problem solving.
The ability to formalize in its subject
area within the constraints of methods of research
ПК-3
Preparedness to use research techniques and tools to study objects of
professional activity
Professional,
Analytical activities
ПК-6
Professional,
Project activities
ПК-10
The ability to formalize the subject
area of a software project and to develop specifications the software
product components
The ability to design, develop and
test software products.
Professional,
Project activities
ПК-11
Professional,
Project activities
ПК-12
Professional,
Technology-oriented activities
ПК-15 Skills in using the operating systems,
Professional,
Scientific research activities
The ability to read, understand, and
extract the main idea from source
code and documentation
Skills in modeling, analysis and use
of formal methods of software
network technologies, software interface development tools, languages
and methods of formal specifications,
database management systems
3
Education forms and methods
for competency formation
Lectures,
Essay composing,
Home assignments,
Self-study
Lectures,
Practical studies in data modeling,
Home assignments (data modeling part)
Lectures,
Practical studies in data model
assessment and evaluation,
Home assignments (solutions
evalution part)
Lectures,
Practical studies,
Home assignments
Lectures,
Practical studies,
Home assignments
Practical studies,
Industrial cases reviews
Lectures,
Practical studies,
Home assignments
Lectures,
Practical studies,
Home assignments
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
Competency
Professional,
Technology-oriented activities
Professional,
Technology-oriented activities
Professional,
Technology-oriented activities
4
Code
Descriptors
ПК-16
Skills in using different software development technologies
ПК-17
The ability to apply core software
development methods and tools
ПК-19
The ability to understand life cycle
standards and models.
Education forms and methods
for competency formation
Lectures,
Practical studies,
Home assignments
Lectures,
Practical studies,
Home assignments
Lectures,
Practical studies,
Home assignments
The Course Position in the Curriculum
This mandatory course belongs to professional cycle curricula unit (Б.3 unit/ Base module [Professional cycle disciplines 1.3] of 2015-2016 academic year's working syllabus) covered by the list of
training courses of bachelor's program (3rd year of studies).
Prerequisite courses: Computer science (Informatics), Mathematical Logic and Theory of Algorithms, Discrete Mathematics, Programming, Software Construction, Algorithms Design and
Analysis. The course is based on the knowledge of foundations of discrete mathematics (including
logic and set theory), computer science, and computer programming. Students are assumed to understand basic data structures and their uses (lists, arrays), concepts of object-oriented programming (classes, inheritance), modular software design and method linkage, the design, implementation and testing of medium-sized problem solutions, to be familiar with some contexts in which
databases are used.
Courses that are based on this course: Information Systems, Program projects management, Group
project on Software Engineering, Joint year projects, Final projects.
5
№
Course Plan
Topic title
Total
hours
Classroom hours
Lectur Semin
Practice
es
ars
Self-study
Module #3, 3rd year
1
2
3
4
5
6
7
Introduction.
Data modeling.
Relational Model.
Relational Query Languages.
SQL.
Database Design: The E/R and UML Approaches.
Relational Database Design.
Module #3 totals
10
2
2
2
2
4
2
2
2
2
2
2
2
2
2
2
4
14
8
14
72
2
16
2
14
10
42
20
8
8
12
12
4
2
2
2
2
4
2
2
2
4
12
4
4
8
6
6
6
6
8
22
Module #4, 3rd year
8
9
10
11
12
Application Design and Development.
Storage and File Structure.
Indexing and Hashing.
Query Processing.
Transaction Management.
4
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
Distributed and Parallel Databases.
Module #4 totals
3rd year totals
13
12
72
144
2
14
30
2
16
30
8
42
84
10
22
20
20
72
2
4
4
4
14
2
6
4
4
16
6
12
12
12
42
34
14
12
12
72
144
288
8
4
2
2
16
30
60
8
2
2
2
14
30
60
18
8
8
8
42
84
168
Module #1, 4rd year
Data warehousing and Big Data Management
Data Warehousing Architectures and Models
Data Cleaning And Integration
Key/Value and Document Databases
Module #1 totals
14
15
16
17
Module #2, 4rd year
Map/Reduce and Hadoop
Large-scale Distributed Databases
In-memory Databases
Data Streams Management
Module #2 totals
4th year totals
TOTAL
18
19
20
21
6
Assessment
Type
Form
Current
(week)
Test
Quiz
3rd year
1 2 3 4
7
* *
Essay
Interim
Final
6.1
4
Homework
presentation
Exam
Exam
4
4th year
1 2 3 4
7
* *
4
*
4
*
*
*
Dept
Parameters **
Written test, 30 min.
Each week, first 10
minutes of lecture
Written essay, up to 2 pages, homework
Group presentation of
home assignment followed
by demonstration
Written exam, 90 min.
Written exam, 90 min.
Guidelines for Knowledge Assessment
Home assignment 1 (HA1) has to be prepared in modules 3 and 4 of 3rd year by students in groups
of up to 5 and includes design, implementation and testing of a database and database application
for given subject area (chosen by group and approved by instructor or assigned by instructor). Results of home assignment 1 should be presented in form of report that consists of design document,
implementation description, results of testing. Mandatory appendixes are source code for application and database creation script. Report should be submitted to LMS not later that for 7 calendar
days before assigned date of its presentation (on the last week of 4th module). Project should be
presented and demonstrated by all group members. Each group member should demonstrate complete understanding of all project details and give correct answers to at least two questions of instructor.
Written test at the end of the first module (last week of module 3 of 3rd year of study) implies arrangement of the written test (in lecture room) for all students enrolled to the course. Topics covered by the test embraces all course material of first module.
Written exam at the end of the second module (module 4 of 3rd year of study) implies arrangement
5
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
of the written test (in lecture room) for all students enrolled to the course. Topics covered by the
test embraces all course material of first two modules.
Home assignment 2 (HA2) has to be prepared in modules 1 and 2 of 4th year by students in groups
of up to 5. Students should identify a Big Data problem that they would like to work on and specify a difficult question to data and identify data sources that might help answering the question.
Project result should contain two (possibly partial) solutions for identified problem: a) data warehouse-based solution including OLAP options and b) Hadoop, MongoDB or Cassandra-based solution. Built solutions should be compared, their relative strengths and weaknesses are revealed
and described. Chosen Big Data problem should be approved by instructor. Results of HA2 should
be presented in form of project report that consists of design document, implementation description, results of testing and comparing of solutions. Mandatory appendixes are source code for both
applications and data managements scripts. Report should be submitted to LMS not later that for 7
calendar days before assigned date of its presentation (on the last week of module 2 of 4th year).
Project should be presented and demonstrated by all group members. Each group member should
demonstrate complete understanding of all project details and give correct answers to at least two
questions of instructor.
6.2
Grading System
Rounding procedure for grades (where applicable): up to an integer number of points.
Practice activity during practice hours is assessed by evaluating of student involvement into discussions as well as quality of exercise performance during practice. Practice activity grades (Оclassroom 3rd year and Оclassroom 4th year ) use a ten-point scale.
Students have to write an essay (on topic proposed by instructor at first lecture) once in each module (due 4th week of each module) with length of up to 2 pages. Grades Оessays 3rd year and Оessays 4th
rd
th
year are an arithmetic averages of two 3 and two 4 year essays grades respectively (ten-point
scale, rounding up to an integer number of points).
Students has to answer quiz questions in first 10 minutes of each lecture. Grades Оquiz 3rd year and
Оquiz 4th year are an arithmetic averages of quiz grades respectively (ten-point scale, rounding up to
an integer number of points). Average is calculated by division of sum of all student’s quiz answers grades on a total quizzes count in a year.
Students has to answer test questions on 7th week of 3rd module of 3rd year and on 7th week of 2nd
module of 4th year. Grades Оtest 3rd year and Оtest 4th year for that tests are on ten-point scale.
Value of Оhomework 3rd year (homework assignment 1) component of final grade formula is an integer
value from interval [0,10] consists of the common score for the report and presentation (from 0 to
5; same score to all group members) and individual student score for the answers to the questions
(from 0 to 5). If a student misses the project presentation because of some valid reason, s/he receives «absence» grade. If a student misses the project presentation because of any other reason,
s/he receives grade based on individual score set to 0.
Value of Оhomework 4th year (homework assignment 2) component of final grade formula is an integer
value from interval [0,10] consists of the common score for the report and presentation (from 0 to
5; same score to all group members) and individual student score for the answers to the questions
(from 0 to 5). If a student misses the project presentation because of some valid reason, s/he receives «absence» grade. If a student misses the project presentation because of any other reason,
s/he receives grade based on individual score set to 0.
Written test at the end of the first module (last week of module 3 of 3rd year of study) is assessed
on usual ten-point scale.
Interim written exam at the end of the second module (module 4 of 3rd year of study) Оinterim exam is
assessed on usual ten-point scale.
Final written exam at the end of the fourth module (module 2 of 4th year of study) Оfinal exam is as6
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
sessed on usual ten-point scale.
Cumulative grade for student’s current results in 3rd year is calculated using the following formula:
Оcumulative 3rd year = 0,7* Оcurrent 3rd year + 0,3* Оclassroom 3rd year
where
Оcurrent 3rd year = 0,2·Оessays 3rd year+0,2·Оtest 3rd year+0,2·Оquiz 3rd year+0,6·Оhomework 3rd year;
Interim grade for 3rd year shoul be calculated according to the following formula:
Оinterim 3rd year = 0,5·Оcurrent 3rd year + 0,5·Оinterim exam
Cumulative grade for student’s current results in 4th year is calculated using the following formula:
Оcumulative 4th year = 0,7* Оcurrent 4th year + 0,3* Оclassroom 4th year
where
Оcurrent 4th year = 0,2·Оessays 4th year+0,2·Оtest 4th year+0,2·Оquiz 4th year+0,6·Оhomework 4th year;
Final cumulative grade for student is calculated using the following formula:
Оcumulative final = (Оinterim 3rd year + Оcumulative 4th year )/2
Final grade for student is calculated using the following formula:
Оfinal = 0,5 Оcumulative final + 0,5 Оfinal exam.
7
Detailed Course Contents
 Тopic 1: Introduction.
♦ Lectures: 2h. Practice: 2h. Self:study: 2h.
♦ Outline:
 Course overview and logistics.
 History of data management approaches.
 Basic database system concepts.
 Database environment.
 Database users.
 Database development process.
 Database planning.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.1]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.1]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 2: Data Modeling.
♦ Lectures: 2h. Practice: 2h. Self:study: 2h.
♦ Outline:
 Three level database architecture.
 Data model.
 Data independence.
 Inverted files.
 Early data models: hierarchical and network.
 Basic relational model concepts.
 Object-oriented model.
 Object-relational model.
7
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Semi-structured model.
 Semantic data models.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.1]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.1, 2.1]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 3: Relational Model.
♦ Lectures: 2h. Practice: 2h. Self:study: 2h.
♦ Outline:
 History of relational model.
 Advantages of relational model.
 Basic relational data structures.
 Mathematical and database relations.
 Relation schema.
 Relational database.
 Integrity constraints.
 Relation keys.
 Key constraint.
 Foreign key constraint.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.2]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.2.2]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 4: Relational Query Languages.
♦ Lectures: 2h. Practice: 2h. Self:study: 4h.
♦ Outline:
 Relational algebra.
 Relational algebra operations.
 Selection.
 Projection.
 Set operations.
 Renaming.
 Join, equijoin, antijoin, theta-join.
 Natural join.
 Division.
 Tuple relational calculus: atoms, formulas, queries.
 Domain relational calculus.
♦ Core books (sources of information)
8
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.6]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.5]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 5: Structured Query Language.
♦ Lectures: 4h. Practice: 2h. Self:study: 14h.
♦ Outline:
 SQL language history. SQL standards.
 Data definition and data manipulation (sub)languages.
 SQL data types.
 Table declaration. Primary keys, unique constraints, default values, nullable attributes.
 Check constraints.
 Foreign key constraints.
 Handling foreign key violations.
 Indexes.
 Database schema modifications.
 SQL query sublanguage: SELECT.
 Single-table queries. Filtering conditions. Logical operations IN, ALL, EXISTS.
 Join queries. Join types: cross, natural, inner, outer, self.
 Duplicates elimination.
 Set operations.
 Nested queries. Correlated nested queries.
 Aggregate functions.
 Grouping and group filtering.
 Query result sorting.
 INSERT.
 UPDATE .
 DELETE.
 Views: creation, use and updating.
 Triggers: creation, activation, execution. Multiple triggers.
 View materialization.
 SQL procedural extensions and dialects: T-SQL, PL/SQL, PgSQL.
 Stored procedures.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.3,4,5]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.6,7,8]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 6: Database Design: E/R and UML Approaches.
♦ Outline:
9
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Entity-relationship model.
 Entities and attributes.
 Entity types.
 Keys.
 E/R diagram.
 Relationships.
 Attributes and roles.
 Relationship type, degree and cardinality.
 Relationship participation constraints.
 Strong and weak entities.
 Entity type hierarchies.
 Specialization and generalization.
 Total and partial unions.
 Unified modeling language class diagram.
 Association and aggregation in UML.
 Generalization hierarchies in UML.
 Multiplicity indicators in UML.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.7]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.4]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 7: Relational Database Design.
♦ Lectures: 2h. Practice: 2h. Self:study: 8h.
♦ Outline:
 Objectives of normalization.
 Limitations of E/R design.
 Redundancy.
 Anomalies: insertion, deletion, update.
 Decomposition.
 Informal guidelines for relation design.
 Functional dependencies.
 Axioms of functional dependencies.
 Closure.
 Minimal cover of a set of dependencies.
 Desirable properties of decompositions: attributes preservation, dependency preservation,
lossless join.
 First normal form.
 Full functional dependencies.
 Second normal form.
 Transitive dependency.
 Third normal form
 Boyce/Codd normal form (BCNF).
 Multivalued dependencies.
 Fourth normal form.
10
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Fifth normal form.
 Domain/Key normal form (DKNF).
 BCNF decomposition algorithm and its properties.
 Normalization drawbacks.
 Denormalization.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.8]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.3]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 8: Application Design and Development.
♦ Lectures: 2h. Practice: 2h. Self:study: 10h.
♦ Outline:
 Database access from programming languages.
 ODBC architecture.
 ODBC Drivers.
 ODBC connection strings.
 JDBC architecture.
 Connecting to DBMS.
 Preparing and executing queries.
 Using result sets and cursors.
 Handling exceptions.
 Transactions in JDBC.
 Object-relational mapping.
 Design patterns for data persistence.
 Active Record pattern.
 Data Mapper pattern.
 Hybernate.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.9]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.9]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 9: Storage and File Structure.
♦ Lectures: 2h. Practice: 2h. Self:study: 4h.
♦ Outline:
 Memory hierarchy.
 Disk storage: physical disk structure, pages and blocks
 I/O time: seek latency, transfer time.
 Disk cache.
 RAID.
11
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 SSD.
 SAN and NAS.
 Datafiles: blocks and extents. Block structure.
 Fixed and variable record formats.
 Large objects (LOBs).
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.10]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.13]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 10: Indexing and Hashing.
♦ Lectures: 2h. Practice: 2h. Self:study: 4h.
♦ Outline:
 Indexing concepts.
 B-tree index.
 B-tree insertions and deletions.
 Hash index.
 Hash functions.
 Extendible hashing.
 Bitmap index.
 Join index.
 GiST.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.11]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.14]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 11: Query processing.
♦ Lectures: 2h. Practice: 2h. Self:study: 8h.
♦ Outline:
 Query processing overview.
 Relational algebra translation.
 Query tree.
 Relational algebra equivalences.
 Heuristics for optimization.
 Cost-based optimization.
 Cost factors and estimation.
 External merge sort.
 Duplicate elimination.
 Implementing set operations.
 Sort-based and hash-based projection.
12
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Computing selection without indexes.
 Computing selection with clustered index.
 Computing selection with b-tree index.
 Computing selection with hash index.
 Computing joins: nested loops.
 Computing joins: block nested loops.
 Computing joins: sort-merge join.
 Computing joins: hash-join.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.12,13]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.15,16]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 12:Transaction management.
♦ Lectures: 2h. Practice: 4h. Self:study: 6h.
♦ Outline:
 Single-user and multi-user systems.
 Basic concepts of transactions.
 ACID properties.
 Isolation.
 Serial and interleaved execution.
 Schedules: serial, serializable.
 Methods to ensure serializability.
 Concurrency control.
 Optimistic and pessimistic concurrency.
 Two-phase locking.
 Locking and deadlocks.
 Implementing isolation levels with locks.
 Snapshot isolation.
 Timestamping.
 Atomicity and Durability.
 Write-ahead log.
 Redo and undo records.
 Recovery from crash.
 Checkpoints.
 Distributed transactions.
 Two-phase commit protocol.
 Replication.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.14,15,16]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.17,18,19]
13
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
 Тopic 13: Distributed and parallel databases.
♦ Lectures: 2h. Practice: 2h. Self:study: 8h.
♦ Outline:
 Distributed database systems: advantages and drawbacks.
 Distributed database architectures.
 Software components and functions of distributed DBMS.
 Data placement.
 Transaction management for distributed DBMS.
 Locking protocols.
 Timestamping protocols.
 Commit protocols.
 Distributed recovery from failures.
 Distributed query processing.
 Data integration.
 Parallel database systems.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed, McGrawHill, 2010. — 1376pp. [Ch.17,18,19]

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.20]

Elmasri, R., Navathe, S.B. (2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.

 Тopic 14: Data Warehousing and Big Data Management.
♦ Lectures: 2h. Practice: 2h. Self:study: 6h.
♦ Outline:
 Role and purpose of a data warehouse.
 Components of a data warehouse.
 Data warehousing and OLAP.
 Semistructured data and XML.
 Big data concepts and notions.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed,
McGraw-Hill, 2010. — 1376pp. [Ch.20,23,26]

Franks B. Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams
with Advanced Analytics (Wiley and SAS Business Series), Wiley, 2012. — 336pp.

 Тopic 15: Data Warehouse Architecture and Models.
♦ Lectures: 4h. Practice: 6h. Self:study: 12h.
♦ Outline:
 Requirements to data management from decision support systems.
 Historical, summarized, integrated data.
 Statistical and analytical queries.
 Business intelligence applications.
14
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра















Three-tier architecture.
Data warehouse, data mart.
Online analytical processing (OLAP).
Conceptual models for decision support.
Multidimensional view on the data.
Cross-tabulation.
Data cubes.
Operations with data cubes: roll-up, drill-down, pivot, slice & dice, select.
Attribute hierarchies. Types of hierarchies.
Query languages for supporting OLAP.
SQL extensions: Group by cube, group by rollup.
Analytic functions. Rank functions, window functions, lag/lead functions.
Multidimensional expressions (MDX).
Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation.
Multi-dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense dimensions.
 Indexing of dimensions: b-tree, bitmap, join indexes.
 Hybrid OLAP (HOLAP).
 Slowly changing dimensions.
 Temporal databases.
 Valid time, transaction time.
 Bitemporal data model.
♦ Core books (sources of information)

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) Database System Concepts, 6th ed,
McGraw-Hill, 2010. — 1376pp. [Ch. 20]

Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies,
McGraw-Hill Osborne Media, 2009. — 480pp.

Celko J. Joe Celko's Analytics and OLAP in SQL, Morgan Kaufmann, 2006. — 208pp.

Smith B.C., Clay C.R. Microsoft SQL Server 2008 MDX Step by Step, Microsoft Press, 2009. —
400pp.

Chaudhury S., Dayal U. An overview of data warehousing and OLAP technology. // SIGMOD Record, v.26 n.2, pp.507-508, 1997.

Harinarayan V., Rajaraman A., and Ullman J. Implementing Data Cubes Efficiently. // In Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96),
pp. 205-216, 1996.

Jensen C., Pedersen T., Thomsen C. Multidimensional Databases and Data Warehousing, Morgan &
Claypool Publishers, 2010. — 111p.

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.10.6,10.7]

Rainardi V. Building a Data Warehouse: With Examples in SQL Server. Apress, 2008. —
541pp.

Inmon W. H., Krishnan K. Building the Unstructured Data Warehouse: Architecture, Analysis,
and Design, Technics Publications, 2011. — 216pp.

Kimball R., Ross M. The Data Warehouse Toolkit, Wiley, 2002. — 447pp.
 Тopic 16: Data Cleaning and Integration.
15
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
♦ Lectures: 4h. Practice: 4h. Self:study: 12h.
♦ Outline:
 Data integration issues and stages.
 ETL process design and implementation.
 Error handling in ETL process.
 Metadata and ontologies in data integration.
 Data structures for ETL.
 Extracting data from external sources.
 Data profiling.
 Data cleaning. Data quality dimensions, issues and constraints.
 Typical cleaning checks.
 Conforming data.
 Data enrichment.
 Entity resolution.
 Data transformation.
 Loading data into warehouse.
 Bulk load.
♦ Core books (sources of information)

Garcia-Molina, H., Ullman, J., Widom, J. (2009) Database Systems: The Complete Book, 2nd Edition, Prentice Hall, 2009. — 1248pp. [Ch.21]

Kimball R., Caserta J. The Data Warehouse ETL Toolkit, Wiley, 2004. — 491pp.

Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies,
McGraw-Hill Osborne Media, 2009. — 480pp.

Rodrigues F., Coles M., Dye D. Pro SQL Server 2012 Integration Services, Apress, 2012. — 636
pp.
 Тopic 17: Key/Value and Document Databases.
♦ Lectures: 4h. Practice: 4h. Self:study: 12h.
♦ Outline:
 Key/value stores: Memcache, Berkeley DB, Membase, Riak.
 Extended key/value (data structures) store: Redis.
 Document databases: MongoDB.
♦ Core books (sources of information)

Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp.

Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL
Movement, Pragmatic Bookshelf, 2012. — 352pp.
 Тopic 18: Map/Reduce and Hadoop.
♦ Lectures: 8h. Practice: 8h. Self:study: 18h.
♦ Outline:
 Large-scale massive data processing.
 Map/reduce paradigm.
 Hadoop map/reduce implementation.
 Hadoop distributed filesystem HDFS.
 Hadoop I/O.
 Developing hadoop applications.
16
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Managing job configuration.
 Mapper, Reducer, Combiner.
 Running job locally and on a cluster.
 Map/reduce design patterns.
 Pig. Pig Latin.
 Hive. HiveQL.
 HBase BigTable implementation.
♦ Core books (sources of information)

White T. Hadoop: The Definitive Guide, 4th edition, O’Reilly, 2015. — 756pp.

Holmes A. Hadoop In Practice, Manning Publications, 2012. — 511pp.

Miner D., Shook A. MapReduce Design Patterns, O’Reilly, 2012. — 232pp.

Gates A. Programming Pig, O’Reilly, 2011. — 203pp.

Sitto, Kevin, Presser, Marshall (2015) Field Guide to Hadoop. – O’Reilly, 2015. – 118 pp.

Gunarathne, Thilina (2015) Hadoop MapReduce v2 Cookbook, 2nd ed. – Packt, 2015.

Karanth. Sandeep (2014) Mastering Hadoop. – Packt, 2014. – 351 pp
 Тopic 19: Large-scale Distributed Databases.
♦ Lectures: 4h. Practice: 2h. Self:study: 8h.
♦ Outline:
 Hashing for distributed data storage.
 Consistent hashing.
 Dynamic hash tables (DHT).
 Vector clocks and conflict detection.
 Gossip protocol and hinted handoff.
 Merkle trees.
 BigTable data model
 Cassandra.
♦ Core books (sources of information)

Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp.

Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL
Movement, Pragmatic Bookshelf, 2012. — 352pp.

Neeraj, Nishant. (2013) Mastering Apache Cassandra. – Packt Publishing, 2013. – 340pp.

Bradberry, Russell, Lubow. Eric (2014) Practical Cassandra. A Developer’s Approach. – Addison
Wesley, 2014. – 193 pp
 Тopic 20: In-memory Databases.
♦ Lectures: 2h. Practice: 2h. Self:study: 8h.
♦ Outline:
 In-memory databases principles.
 Data storage layout, encoding and compression.
 In-memory data manipulation operations.
 In-memory queries and tuple reconstruction.
 Implementations issues: differential buffer, indices, merge process, logging, recovery,
workload management.
17
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
 Implications for application development.
♦ Core books (sources of information)

Plattner H. (2013). A Course in In-Memory Data Management, Springer, 2013. – 298 pp.
 Тopic 21: Data Stream Management Systems.
♦ Lectures: 2h. Practice: 2h. Self:study: 8h.
♦ Outline:
 Data stream example applications.
 Data stream systems: STREAM, Aurora.
 Data stream models: relational and XML.
 Window-based processing.
 STREAM CQL. Query semantics. Mapping Streams to Relations and vice versa.
 Aurora SQuAL. Operators.
 Query processing.
 STREAM: architecture, query plan, optimizations.
 Aurora: architecture, optimizations.
 Distributed stream processing: Borealis.
♦ Core books (sources of information)
8

Golab L., Oszu M. T. Data Stream Management (Synthesis Lectures on Data Management),
Morgan & Claypool Publishers, 2010. — 80 pp.

Arasu A., Babu S., Widom J. The CQL Continuous Query Language: Semantic Foundations
and Query Execution, VLDB Journal, 15(2), 2006. pp.121-142.

Arasu A., Babcock B., Babu S., Cieslewicz J., Datar M., Ito K., Motwani R., Srivastava U., Widom
J. STREAM: The Stanford Data Stream Management System. // M. Garofalakis, J. Gehrke, and
R. Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams, Springer,
2009.

Cherniack M., Zdonik S. Stream-Oriented Query Languages and Operators, Encyclopedia of
Database Systems, Springer, 2009. pp 2848-2854.
Methods of Instruction
Course studies are organized in the form of lectures and practical studies. Besides traditional
forms, some active and interactive forms are provided: discussion of real industry case studies;
proposing and discussing group projects topics and its planned outcomes, using interactive simulators for database languages.
9
9.1
Assessment tools for students current evaluation and attestation
Topics for assignments
The course includes two home assignments (one in each year of study), compulsory to all students. Students will work in groups of up to 5 students on one of the suggested topics.
First home assignment: Build a relational database application using the techniques studied in
the course.
Second home assignments: Student should identify a Big Data problem that s/he would like to
work on and specify the following:

Ask a difficult question.
18
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра

Identify data sources that you believe might help answering the question.

Build a data warehouse aimed at answering the question.

Build OLAP solution aimed at answering the question. Build Hadoop (or NoSQL) solution aimed at answering the question.

Compare built solutions, describe relative strengths and weaknesses.
9.2
Topics for course final assessment






































Basic database system concepts.
Database environment.
Database planning.
Three level database architecture.
Basic relational model concepts.
Object-oriented model.
Object-relational model.
Semi-structured model.
Mathematical and database relations. Relation schema. Relational database.
Integrity constraints.
Relation keys. Key constraint. Foreign key constraint.
Relational algebra operations. Selection. Projection.
Relational algebra operations. Set operations.
Relational algebra operations. Join, equijoin, antijoin, theta-join. Natural join.
Relational algebra operations. Division.
Tuple relational calculus: atoms, formulas, queries.
SQL data types.
SQL. Table declaration. Primary keys, unique constraints, default values, nullable attributes.
SQL. Check constraints.
SQL. Foreign key constraints. Handling foreign key violations.
SQL. Database schema modifications.
SQL. Single-table queries. Filtering conditions. Logical operations IN, ALL, EXISTS.
SQL. Join queries. Join types: cross, natural, inner, outer, self.
SQL. Duplicates elimination.
SQL. Set operations.
SQL. Nested queries. Correlated nested queries.
SQL. Aggregate functions.
SQL. Grouping and group filtering.
SQL. Query result sorting.
SQL. INSERT.
SQL. UPDATE .
SQL. DELETE.
SQL. Views: creation, use and updating.
SQL. Triggers: creation, activation, execution. Multiple triggers.
SQL. View materialization.
Stored procedures.
Entity-relationship model. Entities and attributes. Entity types. Keys.
Entity-relationship model. Relationships. Attributes and roles. Relationship type, degree and cardi-
19
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра








































nality. Relationship participation constraints.
Entity type hierarchies. Specialization and generalization. Total and partial unions.
Unified modeling language class diagram. Association and aggregation in UML. Generalization
hierarchies in UML. Multiplicity indicators in UML.
Objectives of normalization. Limitations of E/R design. Data redundancy.
Anomalies: insertion, deletion, update.
Functional dependencies. Axioms of functional dependencies.
Closure.Minimal cover of a set of dependencies.
Desirable properties of decompositions: attributes preservation, dependency preservation, lossless
join.
First normal form.
Full functional dependencies. Second normal form.
Transitive dependency. Third normal form. Boyce/Codd normal form (BCNF).
Multivalued dependencies. Fourth normal form.
Fifth normal form.
Domain/Key normal form (DKNF).
BCNF decomposition algorithm and its properties.
Normalization drawbacks.
JDBC architecture, connecting to DBMS, preparing and executing queries, using result sets and
cursors.
Handling exceptions in JDBC.
Transactions in JDBC.
Object-relational mapping.
Design patterns for data persistence. Active Record pattern.
Design patterns for data persistence. Data Mapper pattern.
Hibernate.
Datafiles: blocks and extents. Block structure. Fixed and variable record formats. Large objects
(LOBs).
B-tree index. B-tree insertions and deletions.
Hash index. Hash functions.Extendible hashing.
Bitmap index.
Join index.
GiST.
Relational algebra translation. Query tree.
Relational algebra equivalences.
Cost-based optimization. Cost factors and estimation.
External merge sort.
Duplicate elimination.
Implementing set operations.
Sort-based and hash-based projection.
Computing selection without indexes.
Computing selection with clustered index.
Computing selection with b-tree index.
Computing selection with hash index.
Computing joins: nested loops.
20
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра












































Computing joins: block nested loops.
Computing joins: sort-merge join.
Computing joins: hash-join.
Basic concepts of transactions. ACID properties.
Schedules: serial, serializable. Methods to ensure serializability.
Optimistic and pessimistic concurrency.
Two-phase locking. Locking and deadlocks. Implementing isolation levels with locks.
Snapshot isolation.
Write-ahead log. Redo and undo records. Recovery from crash.
Distributed transactions. Two-phase commit protocol.
Distributed database architectures.
Software components and functions of distributed DBMS.
Transaction management for distributed DBMS.
Distributed recovery from failures.
Distributed query processing.
Data integration.
Parallel database systems.
Object-relational mapper.
Requirements to data management from decision support systems.
Extract-transform-load process.
Conceptual models for decision support.
Multidimensional view on the data.
Operations with data cubes: roll-up, drill-down, pivot, slice & dice, select.
Query languages for supporting OLAP.
SQL extensions: Group by cube, group by rollup.
Multidimensional expressions (MDX).
View materialization: optimal set of views, partial order on views, cost model, greedy algorithm.
Relational OLAP (ROLAP): Star schema, snowflake schema, snowflake constellation.
Multi-dimensional OLAP (MOLAP): multicubes and hypercubes, sparse and dense dimensions.
Indexing of dimensions: bitmap indexes.
Indexing of dimensions: join indexes.
Hybrid OLAP (HOLAP).
ETL process design and implementation.
Data cleaning. Data quality dimensions, issues and constraints. Typical cleaning checks.
Entity resolution.
Data transformation.
Loading data into warehouse. Bulk load.
Map/reduce paradigm.
Hadoop map/reduce implementation.
Hadoop distributed filesystem HDFS.
Map/reduce design patterns.
Pig. Pig Latin.
Hive. HiveQL.
HBase BigTable implementation.
21
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра


















Dynamic hash tables (DHT).
Vector clocks and conflict detection.
Gossip protocol and hinted handoff.
Key/value stores: properties and usage.
Extended key/value (data structures) store Redis: properties and usage.
Document database MongoDB: properties and usage.
Large-scale distributed databases: Cassandra.
In-memory databases principles.
Data storage layout, encoding and compression.
In-memory data manipulation operations.
In-memory queries and tuple reconstruction.
Data stream models: relational and XML.
Window-based processing.
STREAM CQL. Query semantics. Mapping Streams to Relations and vice versa.
Aurora SQuAL. Operators.
STREAM: architecture, query plan, optimizations.
Aurora: architecture, optimizations.
Distributed stream processing: Borealis.
10 Learning resources
10.1 Core Textbooks

Silberschatz, A., Korth, H.F., Sudarshan, S. (2010) (2010) Database System Concepts, 6th ed,
McGraw-Hill, 2010. — 1376pp.

Garcia-Molina, H., Ullman, J., Widom, J. (2009) (2009) Database Systems: The Complete Book, 2nd
Edition, Prentice Hall, 2009. — 1248pp.

Elmasri, R., Navathe, S.B. (2010)(2010) Fundamentals of Database Systems, 6th ed., Addison Wesley,
2010. — 1200 pp.
10.2 Recommended books

Blaha Michael (2010) Patterns of Data Modeling (Emerging Directions in Database Systems and Applications), CRC Press, 2010. — 261pp.

Kuate P.H. et al. (2009) NHibernate in Action, Manning Publications, 2009. — 400pp.

Golfarelli M., Rizzi S. Data Warehouse Design: Modern Principles and Methodologies, McGrawHill Osborne Media, 2009. — 480pp.

Celko J. Joe Celko's Analytics and OLAP in SQL, Morgan Kaufmann, 2006. — 208pp.

Kimball R., Ross M. The Data Warehouse Toolkit, Wiley, 2002. — 447pp.

Kimball R., Caserta J. The Data Warehouse ETL Toolkit, Wiley, 2004. — 491pp.

Smith B.C., Clay C.R. Microsoft SQL Server 2008 MDX Step by Step, Microsoft Press, 2009. —
400pp.

Rodrigues F., Coles M., Dye D. Pro SQL Server 2012 Integration Services, Apress, 2012. — 636 pp.

Ben-Gan I. et al. Inside Microsoft SQL Server 2008: T-SQL Programming, Microsoft Press, 2009.
— 832 pp.
22
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра

Melton J., Buxton S. Querying XML: XQuery, XPath, and SQL/XML in context, Morgann Kaufmann, 2006. 848pp.

Franks B. Taming The Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with
Advanced Analytics (Wiley and SAS Business Series), Wiley, 2012. — 336pp.

White T. Hadoop: The Definitive Guide, 4th edition, O’Reilly, 2015. — 768pp.

Redmond E. Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL
Movement, Pragmatic Bookshelf, 2012. — 352pp.

Robinson I. Graph Databases, O’Reilly, 2013. — 224pp.

Golab L., Oszu M. T. Data Stream Management (Synthesis Lectures on Data Management),
Morgan & Claypool Publishers, 2010. — 80 pp.
10.3 Additional books

Chaudhury S., Dayal U. An overview of data warehousing and OLAP technology. // SIGMOD Record,
v.26 n.2, pp.507-508, 1997.

Harinarayan V., Rajaraman A., and Ullman J. Implementing Data Cubes Efficiently. // In Proceedings
of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD '96), pp. 205216, 1996.

Jensen C., Pedersen T., Thomsen C. Multidimensional Databases and Data Warehousing, Morgan &
Claypool Publishers, 2010. — 111p.

Rainardi V. Building a Data Warehouse: With Examples in SQL Server. Apress, 2008. — 541pp.

Inmon W. H., Krishnan K. Building the Unstructured Data Warehouse: Architecture, Analysis, and
Design, Technics Publications, 2011. — 216pp.

Holmes A. Hadoop In Practice, Manning Publications, 2012. — 511pp.

Miner D., Shook A. MapReduce Design Patterns, O’Reilly, 2012. — 232pp.

Gates A. Programming Pig, O’Reilly, 2011. — 203pp.

Arasu A., Babu S., Widom J. The CQL Continuous Query Language: Semantic Foundations and
Query Execution, VLDB Journal, 15(2), 2006. pp.121-142.

Arasu A., Babcock B., Babu S., Cieslewicz J., Datar M., Ito K., Motwani R., Srivastava U., Widom J.
STREAM: The Stanford Data Stream Management System. // M. Garofalakis, J. Gehrke, and R.
Rastogi, editors, Data Stream Management: Processing High-Speed Data Streams, Springer, 2009.

Cherniack M., Zdonik S. Stream-Oriented Query Languages and Operators, Encyclopedia of Database Systems, Springer, 2009. pp 2848-2854.

Tiwari S. Professional NoSQL, Wrox, 2011. — 384pp.

Neeraj, Nishant. (2013) Mastering Apache Cassandra. – Packt Publishing, 2013. – 340pp.

Bradberry, Russell, Lubow. Eric (2014) Practical Cassandra. A Developer’s Approach. – Addison Wesley, 2014. – 193 pp

Plattner H. (2013). A Course in In-Memory Data Management, Springer, 2013. – 298 pp.

Sitto, Kevin, Presser, Marshall (2015) Field Guide to Hadoop. – O’Reilly, 2015. – 118 pp.

Gunarathne, Thilina (2015) Hadoop MapReduce v2 Cookbook, 2nd ed. – Packt, 2015.

Karanth. Sandeep (2014) Mastering Hadoop. – Packt, 2014. – 351 pp
23
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
10.4 Справочники, словари, энциклопедии

MSDN. Available at: http://msdn.microsoft.com

RSDN. Databases. Available at: http://rsdn.ru/summary/248.xml

CITForum. Available at: http://www.citforum.ru/database

SQL.RU Available at: http://sql.ru

Liu L., Özsu M.T. (2009) Encyclopedia of Database Systems. Springer, 2009. — 748 pp.
10.5 Software tools








Microsoft SQL Server 2008 R2 (or later)
Microsoft SQL Server Analysis Services 2008 R2 (or later)
Microsoft Visual Studio 2008-2010 (or later)
Apache Hadoop 2
Apache Pig
Apache Hive
Apache Cassandra
MongoDB
10.6 Remote course support
LMS is used for remote course support.
11 Special Equipment


Projector for lectures and practical studies.
Computer classes having Microsoft Visual Studio 2010 (or later) and Microsoft SQL Server 2008 (or
later) Management Studio installed
24
Национальный исследовательский университет «Высшая школа экономики»
Программа дисциплины Базы данных
для направления 09.03.04 «Программная инженерия» подготовки бакалавра
25
Download