ECO 725-01: Data Methods in Economics - Spring 2015 Chris Swann

advertisement
ECO 725-01: Data Methods in Economics - Spring 2015
Chris Swann
446 Bryan Building
email: chris_swann@uncg.edu
Office Hours: Open Door or By Appt.
Course meeting time: TH 930-1045
Location: Bryan 211
Description
Econ 725 is a three-credit course in which students learn to work with large data sets using the SAS
programming language. In this course we will explore how to manipulate data (including reading,
writing, and combining data files), how to prepare data for research purposes (including variable
construction, sample selection, and issues related to missing data), and how to conduct basic data
analysis. We will pay attention data quality and how to deal with so-called “dirty” data.
Student Learning Outcomes
On completion of this course, students will have:
1) learned practical procedures for working with data;
2) learned the basics of the SAS programming language; and
3) conducted descriptive research with a large data set.
Procedures
ECO 725 will meet twice per week from 0930 to 1045 on Tuesday and Thursday. We will typically
meet in Bryan 211 though some days we may meet in Bryan 456. Students are expected to follow
the classroom discussion and exercises and to refrain from other activities, such as web-surfing, emailing, and game-playing, during class.
Additionally, I am experimenting with the possibility of moving this to an on-line class so there
may be some classes that are not taught face-to-face.
Your grade will be determined by a series of homework assignments (40% of grade), a midterm
(20% of grade) and a final project (40% of grade). Please note that assignments must be turned in
when they are due. Late assignments will not receive any credit, unless prior arrangements have
been made with the instructor.
There will be two components to the assignments. One is basic exercise to learn particular skills.
The second is to provide you with simple tasks that you will write up as formal responses. I hope
these will be useful if you need to provide writing samples when you apply for internships or jobs.
Software
The primary software package for this class is SAS. SAS is installed in the UNCG computer labs.
SAS licenses for personal computers are available for UNCG students through ITS. To begin the
license process, connect to https://web.uncg.edu/research-access/secure/sas/sas.asp. If you are a
Mac user, I strongly recommend you purchase parallels rather than using SAS’s on-line java
application. It generally will not work for what we want to do. We may also occasionally use Stata
and Excel though no specific knowledge of Stata is required, and if you have an alternate preferred
software package (e.g., SPSS) you should be able to use that as well.
Strongly Recommended Books
Delwiche, Lora D. and Susan J. Slaughter. 2012. The Little SAS® Book: A Primer, Fifth Edition,
Cary, NC: SAS Institute Inc. (LSB below)
Cody, Ron. 2008. Cody’s Data Cleaning Techniques Using SAS®, Second Edition. Cary, NC: SAS
Institute Inc. (C below)
Additional Books
You could amass a significant library of books about SAS. I list below a couple that may be handy
for this class.
Cody, Ron. 2007. Learning SAS by Example: A Programmer’s Guide. Cary, NC: SAS Institute Inc.
Comment: this is similar to The Little SAS book.
DiIorio, Frank C., 1991. SAS Applications Programming: A Gentle Introduction, Duxbury Press.
Comment: This is an old one but still good for the basics.
More documentation than you can imagine is available at
http://support.sas.com/documentation/93/index.html. This is the link for SAS 9.3 which is what I
believe is available in the labs and from ITS.
Additional Readings
These will be made available on Blackboard. These include but are not necessarily limited to
Burns, S. 2013. “When Data and Reality Don’t Match.” in Q. McCallum (ed.) Bad Data Handbook.
O’Reilly.
Cody, R. 2011. “Longitudinal Data Techniques: Looking Across Observations.” Paper 265-2011.
Christen, P. 2012. “Data Pre-Processing.” in Data Matching: Concepts and Techniques for Record
Linkage, Entity Resolution, and Duplicate Detection. Springer-Verlag.
Dasu, T. 2012. “Data Glitches: Monsters in Your Data.” in S. Sadiq (ed.) Handbook of Data
Quality. Springer-Verlag.
Harrington, T. “An Introduction to SAS® PROC SQL.” Paper 70-27.
Herzog, T., F. Scheuren, and W. Winkler. 2007. “Basic Data Quality Tools.” in Data Quality and
Record Linkage Techniques. Springer-Verlag.
Herzog, T., F. Scheuren, and W. Winkler. 2007. “Automatic Editing and Imputation of Sample
Survey Data.” in Data Quality and Record Linkage Techniques. Springer-Verlag.
Kalt, M. and C. Zender. 2011. “Introduction to ODS Graphics for the Non-Statistician.” Paper 2942011.
Li, A. 2013. “Essentials of the Program Data Vector: Directing the Aim to Understanding the Data
Step.” Paper 125-2013.
Pool, G. 2012. “Common Sense SAS – Documenting and Structuring Your Code.”
Ronk, K. “Introduction to Proc SQL.” Paper 268-29.
Schwabish, J. 2013. “Subtle Sources of Bias and Error.” in Q. McCallum (ed.) Bad Data Handbook.
O’Reilly.
Tian, S. 2009. “LAG - the Very Powerful and Easily Misused SAS® Function.” Paper 55-2009.
Vaisman, M. 2013. “The Dark Side of Data Science.” in Q. McCallum (ed.) Bad Data Handbook.
O’Reilly.
Zender, C. 2013. “Macro Basics for New SAS® Users.” Paper 120-2013.
SAS Certification
A number of levels of SAS Certification are available. To become certified with the SAS Basic
Programmer for SAS 9 credential, you must pass a exam that covers many of the areas of
programming that we will use. Information on Basic Programmer certification is available at
http://support.sas.com/certify/creds/bp.html. Because of the overlap in coverage, you are
encouraged to consider studying for and taking this exam. Note, however, that this is not a test prep
class, and we will cover some topics in more detail than may be necessary for the exam while others
included on the exam may not be covered at all. If you are thinking of taking the certification exam,
you should consider the prep guide:
SAS Publishing, SAS® Certification Prep Guide: Base Programming for SAS 9, Second Edition,
Cary, NC: SAS Institute Inc., 2009.
Research Integrity
Students are expected to be familiar with and abide by the University’s Academic Integrity policy
(see http://academicintegrity.uncg.edu/). In particular, students may be expected to work
independently on homework assignments and are expected to work independently on the project.
Assistance will be available from the instructor.
Tentative Outline
Date
Jan 13
Topic
Introduction to data analysis
Jan 15
Introduction to SAS
Jan 20
Before you begin: understanding
your data
Jan 22
Jan 27
Jan 29
Feb 3
Reading text data into SAS
(Numeric) Variable construction:
What do you want to create and
how do you do it?
Basic SAS procedures
Feb 5
Feb 10
Feb 12
Character and date variables
Mechanics of the data step: what
is actually going on?
Feb 17
Debugging your programs
Feb 19
Feb 24
Feb 26
Mar 3
Making your job easier:
Introduction to Macros
Data verification (numeric,
character, and dates)
Output Delivery System
Mar 5
Mar 17
Midterm
Graphing data
Mar 19
Mar 24
Missing data: why it exists, how to
find it, and what to do.
Mar 26
Mar 31
Outliers
Combining data sets
Apr 2
Getting data out of SAS (e.g., text
files, spreadsheets, Stata files)
Repeated observations and
longitudinal data
More Macros
Apr 7
Apr 9
Apr 14
Apr 16
Apr 21
Apr 23
May 1
Reading
“Data Glitches: Monsters in Your Data”
“Subtle Sources of Bias and Error”
“The Dark Side of Data Science”
LSB: Chapter 1
“Common Sense SAS – Documenting and
Structuring Your Code”
“When Data and Reality Don’t Match”
tp-66-highlight & CPS
swann-sheran & HIV
LSB: Chapter 2
Assignment
LSB: Chapter 3
Collect HW1
Hand Out HW2
Hand Out HW1
LSB: Chapter 4
“Essentials of the Program Data Vector”
“Programming with the KEEP, RETAIN,
and DROP data set options”
LSB, Chapter 11
“How to Use the Data Step Debugger”
LSB: Chapter 7
“Macro Basics for New SAS Users”
C: Chapters 1, 2, 4
“Basic Data Quality Tools”
LSB: Chapter 5
Collect HW2
Hand Out HW3
Collect HW 3
Hand Out HW 4
LSB: Chapter 8
“Introduction to ODS Graphics for the NonStatistician.”
C: Chapter 3
“Automatic Editing and Imputation of
Sample Survey Data”
Collect HW 4
Hand Out HW 5
“Data Pre-Processing”
LSB: Chapter 6
C: Chapter 6
LSB: Chapter 10
Collect HW 5
Hand Out HW 6
Hand Out Project
“Longitudinal Data Techniques”
C: Chapter 5
TBA
Collect HW 6
Hand Out HW 7
Proc SQL
Summary/Catch-up
“An Introduction to SAS® PROC SQL”
“Introduction to Proc SQL”
Collect HW 7
Final Project Due
at 12:00PM
Download