ECO 725-01: Data Methods in Economics - Spring 2015 Chris Swann 446 Bryan Building email: chris_swann@uncg.edu Office Hours: Open Door or By Appt. Course meeting time: TH 930-1045 Location: Bryan 211 Description Econ 725 is a three-credit course in which students learn to work with large data sets using the SAS programming language. In this course we will explore how to manipulate data (including reading, writing, and combining data files), how to prepare data for research purposes (including variable construction, sample selection, and issues related to missing data), and how to conduct basic data analysis. We will pay attention data quality and how to deal with so-called “dirty” data. Student Learning Outcomes On completion of this course, students will have: 1) learned practical procedures for working with data; 2) learned the basics of the SAS programming language; and 3) conducted descriptive research with a large data set. Procedures ECO 725 will meet twice per week from 0930 to 1045 on Tuesday and Thursday. We will typically meet in Bryan 211 though some days we may meet in Bryan 456. Students are expected to follow the classroom discussion and exercises and to refrain from other activities, such as web-surfing, emailing, and game-playing, during class. Additionally, I am experimenting with the possibility of moving this to an on-line class so there may be some classes that are not taught face-to-face. Your grade will be determined by a series of homework assignments (40% of grade), a midterm (20% of grade) and a final project (40% of grade). Please note that assignments must be turned in when they are due. Late assignments will not receive any credit, unless prior arrangements have been made with the instructor. There will be two components to the assignments. One is basic exercise to learn particular skills. The second is to provide you with simple tasks that you will write up as formal responses. I hope these will be useful if you need to provide writing samples when you apply for internships or jobs. Software The primary software package for this class is SAS. SAS is installed in the UNCG computer labs. SAS licenses for personal computers are available for UNCG students through ITS. To begin the license process, connect to https://web.uncg.edu/research-access/secure/sas/sas.asp. If you are a Mac user, I strongly recommend you purchase parallels rather than using SAS’s on-line java application. It generally will not work for what we want to do. We may also occasionally use Stata and Excel though no specific knowledge of Stata is required, and if you have an alternate preferred software package (e.g., SPSS) you should be able to use that as well. Strongly Recommended Books Delwiche, Lora D. and Susan J. Slaughter. 2012. The Little SAS® Book: A Primer, Fifth Edition, Cary, NC: SAS Institute Inc. (LSB below) Cody, Ron. 2008. Cody’s Data Cleaning Techniques Using SAS®, Second Edition. Cary, NC: SAS Institute Inc. (C below) Additional Books You could amass a significant library of books about SAS. I list below a couple that may be handy for this class. Cody, Ron. 2007. Learning SAS by Example: A Programmer’s Guide. Cary, NC: SAS Institute Inc. Comment: this is similar to The Little SAS book. DiIorio, Frank C., 1991. SAS Applications Programming: A Gentle Introduction, Duxbury Press. Comment: This is an old one but still good for the basics. More documentation than you can imagine is available at http://support.sas.com/documentation/93/index.html. This is the link for SAS 9.3 which is what I believe is available in the labs and from ITS. Additional Readings These will be made available on Blackboard. These include but are not necessarily limited to Burns, S. 2013. “When Data and Reality Don’t Match.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Cody, R. 2011. “Longitudinal Data Techniques: Looking Across Observations.” Paper 265-2011. Christen, P. 2012. “Data Pre-Processing.” in Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection. Springer-Verlag. Dasu, T. 2012. “Data Glitches: Monsters in Your Data.” in S. Sadiq (ed.) Handbook of Data Quality. Springer-Verlag. Harrington, T. “An Introduction to SAS® PROC SQL.” Paper 70-27. Herzog, T., F. Scheuren, and W. Winkler. 2007. “Basic Data Quality Tools.” in Data Quality and Record Linkage Techniques. Springer-Verlag. Herzog, T., F. Scheuren, and W. Winkler. 2007. “Automatic Editing and Imputation of Sample Survey Data.” in Data Quality and Record Linkage Techniques. Springer-Verlag. Kalt, M. and C. Zender. 2011. “Introduction to ODS Graphics for the Non-Statistician.” Paper 2942011. Li, A. 2013. “Essentials of the Program Data Vector: Directing the Aim to Understanding the Data Step.” Paper 125-2013. Pool, G. 2012. “Common Sense SAS – Documenting and Structuring Your Code.” Ronk, K. “Introduction to Proc SQL.” Paper 268-29. Schwabish, J. 2013. “Subtle Sources of Bias and Error.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Tian, S. 2009. “LAG - the Very Powerful and Easily Misused SAS® Function.” Paper 55-2009. Vaisman, M. 2013. “The Dark Side of Data Science.” in Q. McCallum (ed.) Bad Data Handbook. O’Reilly. Zender, C. 2013. “Macro Basics for New SAS® Users.” Paper 120-2013. SAS Certification A number of levels of SAS Certification are available. To become certified with the SAS Basic Programmer for SAS 9 credential, you must pass a exam that covers many of the areas of programming that we will use. Information on Basic Programmer certification is available at http://support.sas.com/certify/creds/bp.html. Because of the overlap in coverage, you are encouraged to consider studying for and taking this exam. Note, however, that this is not a test prep class, and we will cover some topics in more detail than may be necessary for the exam while others included on the exam may not be covered at all. If you are thinking of taking the certification exam, you should consider the prep guide: SAS Publishing, SAS® Certification Prep Guide: Base Programming for SAS 9, Second Edition, Cary, NC: SAS Institute Inc., 2009. Research Integrity Students are expected to be familiar with and abide by the University’s Academic Integrity policy (see http://academicintegrity.uncg.edu/). In particular, students may be expected to work independently on homework assignments and are expected to work independently on the project. Assistance will be available from the instructor. Tentative Outline Date Jan 13 Topic Introduction to data analysis Jan 15 Introduction to SAS Jan 20 Before you begin: understanding your data Jan 22 Jan 27 Jan 29 Feb 3 Reading text data into SAS (Numeric) Variable construction: What do you want to create and how do you do it? Basic SAS procedures Feb 5 Feb 10 Feb 12 Character and date variables Mechanics of the data step: what is actually going on? Feb 17 Debugging your programs Feb 19 Feb 24 Feb 26 Mar 3 Making your job easier: Introduction to Macros Data verification (numeric, character, and dates) Output Delivery System Mar 5 Mar 17 Midterm Graphing data Mar 19 Mar 24 Missing data: why it exists, how to find it, and what to do. Mar 26 Mar 31 Outliers Combining data sets Apr 2 Getting data out of SAS (e.g., text files, spreadsheets, Stata files) Repeated observations and longitudinal data More Macros Apr 7 Apr 9 Apr 14 Apr 16 Apr 21 Apr 23 May 1 Reading “Data Glitches: Monsters in Your Data” “Subtle Sources of Bias and Error” “The Dark Side of Data Science” LSB: Chapter 1 “Common Sense SAS – Documenting and Structuring Your Code” “When Data and Reality Don’t Match” tp-66-highlight & CPS swann-sheran & HIV LSB: Chapter 2 Assignment LSB: Chapter 3 Collect HW1 Hand Out HW2 Hand Out HW1 LSB: Chapter 4 “Essentials of the Program Data Vector” “Programming with the KEEP, RETAIN, and DROP data set options” LSB, Chapter 11 “How to Use the Data Step Debugger” LSB: Chapter 7 “Macro Basics for New SAS Users” C: Chapters 1, 2, 4 “Basic Data Quality Tools” LSB: Chapter 5 Collect HW2 Hand Out HW3 Collect HW 3 Hand Out HW 4 LSB: Chapter 8 “Introduction to ODS Graphics for the NonStatistician.” C: Chapter 3 “Automatic Editing and Imputation of Sample Survey Data” Collect HW 4 Hand Out HW 5 “Data Pre-Processing” LSB: Chapter 6 C: Chapter 6 LSB: Chapter 10 Collect HW 5 Hand Out HW 6 Hand Out Project “Longitudinal Data Techniques” C: Chapter 5 TBA Collect HW 6 Hand Out HW 7 Proc SQL Summary/Catch-up “An Introduction to SAS® PROC SQL” “Introduction to Proc SQL” Collect HW 7 Final Project Due at 12:00PM