ECO 725-01: Data Methods in Economics - Fall 2014 Chris Swann 446 Bryan Building email: chris_swann@uncg.edu Office Hours: Open Door or By Appt. Course meeting time: MW 930-1045 Location: Bryan 211 GA: Qing Shi – office hours TBA email: q_shi@uncg.edu Description Econ 725 is a three-credit course in which students learn to work with large data sets using the SAS programming language. In this course we will explore how to manipulate data (including reading, writing, and combining data files), how to prepare data for research purposes (including variable construction, sample selection, and issues related to missing data), and how to conduct basic data analysis. We will pay attention data quality and how to deal with so-called “dirty” data. Student Learning Outcomes On completion of this course, students will have: 1) learned practical procedures for working with data; 2) learned the basics of the SAS programming language; and 3) conducted descriptive research with a large data set. Procedures ECO 725 will meet twice per week from 0930 to 1045 on Tuesday and Thursday for the entire semester. We will typically meet in Bryan 211 though some days we may meet in Bryan 456. The school prohibits food and drink from the computer classrooms. Students are expected to follow the classroom discussion and exercises and to refrain from other activities, such as web-surfing, emailing, and game-playing, during class. Your grade will be determined by a series of homework assignments (40% of grade), a midterm (20% of grade) and a final project (40% of grade). Please note that assignments must be turned in when they are due. Late assignments will not receive any credit, unless prior arrangements have been made with the instructor. Software The primary software package for this class is SAS. SAS is installed in the UNCG computer labs. SAS licenses for personal computers are available for UNCG students through ITS. To begin the license process, connect to https://web.uncg.edu/research-access/secure/sas/sas.asp. We will also occasionally use Stata and Excel though no specific knowledge of Stata is required, and if you have an alternate preferred software package (e.g., SPSS) you should be able to use that as well. Strongly Recommended Books Delwiche, Lora D. and Susan J. Slaughter. 2012. The Little SAS® Book: A Primer, Fifth Edition, Cary, NC: SAS Institute Inc. (LSB below) Cody, Ron. 2008. Cody’s Data Cleaning Techniques Using SAS®, Second Edition. Cary, NC: SAS Institute Inc. (C below) Additional Books You could amass a significant library of books about SAS. I list below a couple that may be handy for this class. Cody, Ron. 2007. Learning SAS by Example: A Programmer’s Guide. Cary, NC: SAS Institute Inc. Comment: this is similar to The Little SAS book. DiIorio, Frank C., 1991. SAS Applications Programming: A Gentle Introduction, Duxbury Press. Comment: This is an old one but still good for the basics. More documentation than you can imagine is available at http://support.sas.com/documentation/93/index.html. This is the link for SAS 9.3 which is what I believe is available in the labs and from ITS. Additional Readings These will be made available on Blackboard. These include but are not necessarily limited to Burns, S. 2013. “When Data and Reality Don’t Match.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Cody, R. 2011. “Longitudinal Data Techniques: Looking Across Observations.” Paper 265-2011. Christen, P. 2012. “Data Pre-Processing.” in Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer-Verlag. Dasu, T. 2012. “Data Glitches: Monsters in Your Data.” in S. Sadiq (ed.) Handbook of Data Quality. Springer-Verlag. Harrington, T. “An Introduction to SAS PROC SQL.” Paper 70-27. Herzog, T., F. Scheuren, and W. Winkler. 2007. “Basic Data Quality Tools.” in Data Quality and Record Linkage Techniques. Springer-Verlag. Herzog, T., F. Scheuren, and W. Winkler. 2007. “Automatic Editing and Imputation of Sample Survey Data.” in Data Quality and Record Linkage Techniques. Springer-Verlag. Kalt, M. and C. Zender. 2011. “Introduction to ODS Graphics for the Non-Statistician.” Paper 2942011. Li, A. 2013. “Essentials of the Program Data Vector: Directing the Aim to Understanding the Data Step.” Paper 125-2013. Li, A. 2011. “The Essence of Data Step Programming.” Paper 269-2011. Pool, G. 2012. “Common Sense SAS – Documenting and Structuring Your Code.” Ronk, K. “Introduction to Proc SQL.” Paper 268-29. Schwabish, J. 2013. “Subtle Sources of Bias and Error.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Tian, S. 2009. “LAG - the Very Powerful and Easily Misused SAS® Function.” Paper 55-2009. Vaisman, M. 2013. “The Dark Side of Data Science.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Zender, C. 2013. “Macro Basics for New SAS® Users.” Paper 120-2013. SAS Certification A number of levels of SAS Certification are available. To become certified with the SAS Basic Programmer for SAS 9 credential, you must pass a exam that covers many of the areas of programming that we will use. Information on Basic Programmer certification is available at http://support.sas.com/certify/creds/bp.html. Because of the overlap in coverage, you are encouraged to consider studying for and taking this exam. Note, however, that this is not a test prep class, and we will cover some topics in more detail than may be necessary for the exam while others included on the exam may not be covered at all. If you are thinking of taking the certification exam, you should consider the prep guide: SAS Publishing, SAS® Certification Prep Guide: Base Programming for SAS 9, Second Edition, Cary, NC: SAS Institute Inc., 2009. Research Integrity Students are expected to be familiar with and abide by the University’s Academic Integrity policy (see http://academicintegrity.uncg.edu/). In particular, students may be expected to work independently on homework assignments and are expected to work independently on the project. Assistance will be available from the instructor and teaching assistant. Tentative Outline Date Aug 19 Topic Introduction to data analysis Aug 21 Introduction to SAS Aug 26 Before you begin: understanding your data Reading data Into SAS and basic SAS procedures (e.g., proc contents, proc print, proc means, proc univariate, proc freq.) (Numeric) Variable construction: What do you want to create and how do you do it? Character and date variables Making your job easier: Introduction to Macros Data verification (numeric, character, and dates) Output Delivery System Graphing data Aug 28 Sept 2 Sept 4 Sept 9 Sept 11 Sept 16 Sept 18 Sept 23 Sept 25 Sept 30 Oct 2 Oct 7 Oct 9 Oct 16 Oct 21 Oct 23 Oct 28 Oct 30 Nov 4 Nov 6 Nov 11 Nov 13 Nov 18 Nov 24 TBA Putting it together: understanding and characterizing your data Midterm Missing data: why it exists, how to find it, and what to do. Mechanics of the data step: what is actually going on? Debugging your programs Combining data sets Getting data out of SAS (e.g., text files, spreadsheets, Stata files) Repeated observations and longitudinal data Estimation in SAS: linear and logistic regression More estimation in SAS: probit, selection models, ordered models, and panel data Proc SQL Reading “Data Glitches: Monsters in Your Data” “Subtle Sources of Bias and Error” “The Dark Side of Data Science” LSB: Chapter 1 “Common Sense SAS – Documenting and Structuring Your Code” “When Data and Reality Don’t Match” Assignment Hand Out HW1 LSB: Chapter 2, 4, 9 LSB: Chapter 3 Collect HW1 Hand Out HW2 LSB: Chapter 7 Collect HW2 Hand Out HW3 C: Chapters 1, 2, 4 “Basic Data Quality Tools” LSB: Chapter 5 LSB: Chapter 8 “Introduction to ODS Graphics for the NonStatistician.” TBA C: Chapter 3 “Automatic Editing and Imputation of Sample Survey Data” Collect HW 3 Hand Out Project Hand Out HW 4 “Essentials of the Program Data Vector” LSB, Chapter 11 “Data Pre-Processing” LSB: Chapter 6 C: Chapter 6 LSB: Chapter 10 Collect HW4 Hand Out HW5 “Longitudinal Data Techniques” “The Essence of Data Step Programming” C: Chapter 5 LSB: 9.10 TBA Collect HW 5 “An Introduction to SAS PROC SQL” “Introduction to Proc SQL” Summary/Catch-up Final Project Due