Department: Sociology
Kafedra: Methods and Techniques of Sociological Research
Program: Master's level, Sociology, first year
Author: Nikiforova, IS, Ph.D.
Lecturers: Nikiforova, IS., Ph.D., Sociology of S&T, Georgia Institute of Technology
1.
Explanatory note
This course requires proficiency in R, introductory data analysis and the knowledge of mathematics at the level of Calculus and Linear Algebra. The knowledge of programming is helpful but not required. However, proficiency in English is a prerequisite for academic learning, understanding instructor's explanations, and reporting the results of analyses.
2.
Teaching goals for the course
When working with large datasets, researchers often seek to combine the use of databases such as MySQL and statistical tools provided in R. The goals of this course are 1) to introduce students to relational databases (MySQL and SQLite), SQL language and the tools of accesing SQL databases through R, and 2) to develop their analytical skills by applying basic data mining techniques in their analyses. Students will learn these advanced techniques by working on an individual project.
Learning these methods in English is essential because these concepts were developed in English. Knowing these methods will open a range of new opportunities for students: work and study in English speaking organizations, research opportunities and collaborations with international research community.
3.
Thematic plan
1.
2.
Theme
Introduction to databases and management systems
Installation and management of MySQL and SQLite
Class time
4
4
4
Self study
10
10
10
Total
14
14
14
3.
SQL language and queries
4 10 14
4.
SQL advanced queries
5.
Data manipulation and portability
4 10 14
- 1 -
6.
Data analysis in R: Data mining techniques
7.
Data analysis in R: Visualization
8
4
10
10
18
14
8.
Data analysis in R: Creation of scripts 4 10 14
9.
Creation and automation of reports
4 8 12
10.
Common problems and new applications
8
Total hours:
4.
Brief topic overview
1. Introduction to databases and management systems
48
8
80
16
144
Database systems. Data models and classification. Structure of Relational Databases. Entity-
Relationship (E-R) model and diagrams. Database design. Relations among tables and fields. Keys.
Design of E-R database schema.
2. Installation and management of MySQL and SQLite
Installation and setup. Connections. User privilleges. Using the Query editor. Interface phpMyAdmin.
Creating database views and exploring INFORMATION_SCHEMA. Navigation commands. Views.
3. SQL language and queries
Introduction to SQL (Structured Query Language). Data Types. Schema definition. Composing SQL statements. Basic structure: select, from, where. Insertion, updates, and deletion of data.
4. SQL advanced queries
Advanced statements using IN, BETWEEN, LIKE, HAVING, GROUP BY, ANY, ALL, SOME,
EXISTS, UNION, ORDER BY, and regular expressions. String operations. Ordering. Union, intersect and except operations. Table aggregation.
5. Data manipulation and portability
Input and outputs methods. Redirecting output. Portability to/from other applications: Excel, SPSS, R,
MySQL, SQLite. ODBC drivers, RODBC, SQLDF и DBI пакеты. Using SQL in R.
6. Data Analysis in R: Data mining techniques
Introduction to «data mining». Analysis of data structures: visualization, principle component and classification analyses.
7.
Data Analysis in R: Visualization
R graphic functions. 3D graphics. Trellis, using data frames and matrix graphics. Pictograms.
Customizations: arranging and annotating plots. The Lattice package, The Grid Model.
8. Анализ данных в R: Creation of scripts
Characteristics of good scripts. Writing functions in R. File annotation.
9. Creation and automation of reports
Knowldge discovery. Creation of Reports. Examples.
- 2 -
10. Common problems and new applications
Data mining challenges and prospects. Very Large Data Bases. Knowledge discovery and information retrival.
5.
Forms of control
Type of testing Form of testing Parameters
Current
Intermediate
Final test
6.
Literature
Project Assignment 1
Project Assignment 2
Project Presentation
10%
30%
60%
Main textbook
Spector, Phil. (2008). Data Manipulation with R (Use R!).
New York: Springer.
Welling, Luke, & Thomson, Laura. (2003). MySQL Tutorial . Seattle: MySQL
Press.
Other books
Torgo, Luis. (2010). Data Mining with R: Learning with Case Studies . Chapman
& Hall/CRC.
Шипунов, Алексей Борисович et al. (2012). Наглядная статистика.
Используем R ! Москва: ДМК Пресс.
7.
Contact person
Irina Nikiforova, inikiforova@hse.ru
- 3 -