Sociological Data Analysis

advertisement

Annotation for the course

«Sociological Data Analysis»

Department: Sociology

Kafedra: Methods and Techniques of Sociological Research

Program: Master's level, Sociology, first year

Author: Nikiforova, IS, Ph.D.

Lecturers: Nikiforova, IS., Ph.D., Sociology of S&T, Georgia Institute of Technology

1.

Explanatory note

This course requires proficiency in R, introductory data analysis and the knowledge of mathematics at the level of Calculus and Linear Algebra. The knowledge of programming is helpful but not required. However, proficiency in English is a prerequisite for academic learning, understanding instructor's explanations, and reporting the results of analyses.

2.

Teaching goals for the course

When working with large datasets, researchers often seek to combine the use of databases such as MySQL and statistical tools provided in R. The goals of this course are 1) to introduce students to relational databases (MySQL and SQLite), SQL language and the tools of accesing SQL databases through R, and 2) to develop their analytical skills by applying basic data mining techniques in their analyses. Students will learn these advanced techniques by working on an individual project.

Learning these methods in English is essential because these concepts were developed in English. Knowing these methods will open a range of new opportunities for students: work and study in English speaking organizations, research opportunities and collaborations with international research community.

3.

Thematic plan

1.

2.

Theme

Introduction to databases and management systems

Installation and management of MySQL and SQLite

Class time

4

4

4

Self study

10

10

10

Total

14

14

14

3.

SQL language and queries

4 10 14

4.

SQL advanced queries

5.

Data manipulation and portability

4 10 14

- 1 -

6.

Data analysis in R: Data mining techniques

7.

Data analysis in R: Visualization

8

4

10

10

18

14

8.

Data analysis in R: Creation of scripts 4 10 14

9.

Creation and automation of reports

4 8 12

10.

Common problems and new applications

8

Total hours:

4.

Brief topic overview

1. Introduction to databases and management systems

48

8

80

16

144

Database systems. Data models and classification. Structure of Relational Databases. Entity-

Relationship (E-R) model and diagrams. Database design. Relations among tables and fields. Keys.

Design of E-R database schema.

2. Installation and management of MySQL and SQLite

Installation and setup. Connections. User privilleges. Using the Query editor. Interface phpMyAdmin.

Creating database views and exploring INFORMATION_SCHEMA. Navigation commands. Views.

3. SQL language and queries

Introduction to SQL (Structured Query Language). Data Types. Schema definition. Composing SQL statements. Basic structure: select, from, where. Insertion, updates, and deletion of data.

4. SQL advanced queries

Advanced statements using IN, BETWEEN, LIKE, HAVING, GROUP BY, ANY, ALL, SOME,

EXISTS, UNION, ORDER BY, and regular expressions. String operations. Ordering. Union, intersect and except operations. Table aggregation.

5. Data manipulation and portability

Input and outputs methods. Redirecting output. Portability to/from other applications: Excel, SPSS, R,

MySQL, SQLite. ODBC drivers, RODBC, SQLDF и DBI пакеты. Using SQL in R.

6. Data Analysis in R: Data mining techniques

Introduction to «data mining». Analysis of data structures: visualization, principle component and classification analyses.

7.

Data Analysis in R: Visualization

R graphic functions. 3D graphics. Trellis, using data frames and matrix graphics. Pictograms.

Customizations: arranging and annotating plots. The Lattice package, The Grid Model.

8. Анализ данных в R: Creation of scripts

Characteristics of good scripts. Writing functions in R. File annotation.

9. Creation and automation of reports

Knowldge discovery. Creation of Reports. Examples.

- 2 -

10. Common problems and new applications

Data mining challenges and prospects. Very Large Data Bases. Knowledge discovery and information retrival.

5.

Forms of control

Type of testing Form of testing Parameters

Current

Intermediate

Final test

6.

Literature

Project Assignment 1

Project Assignment 2

Project Presentation

10%

30%

60%

Main textbook

Spector, Phil. (2008). Data Manipulation with R (Use R!).

New York: Springer.

Welling, Luke, & Thomson, Laura. (2003). MySQL Tutorial . Seattle: MySQL

Press.

Other books

Torgo, Luis. (2010). Data Mining with R: Learning with Case Studies . Chapman

& Hall/CRC.

Шипунов, Алексей Борисович et al. (2012). Наглядная статистика.

Используем R ! Москва: ДМК Пресс.

7.

Contact person

Irina Nikiforova, inikiforova@hse.ru

- 3 -

Download