STA 402/502 – Statistical Programming Fall 2004 Course (Section) STA 402/502 (A) Meeting Time: 900-950 MWF (plus other make-up times to be arranged in consultation with students) Meeting Location 260 Bachelor Hall Prerequisites: STA 401/501; STA 671; or permission of the instructor Professor Dr. John Bailer E-mail: baileraj@muohio.edu URL: http://www.users.muohio.edu/~baileraj 292 Bachelor Hall (529-3538) Office (phone) 369C Upham Hall (529-2648) FAX: 529-1493 Office Hours 10:00 - 11:00 MWF (other hours by appointment — don’t be shy!) Purpose of Course: To introduce the use of computers to process and analyze data. Techniques and strategies for managing, manipulating and analyzing data are discussed. Emphasis is on the use of the SAS system. SAS data steps including infile, input, merge, set, looping structures, conditional execution (if-then), etc. are presented. SAS mathematical, statistical and data functions are discussed along with discussion of macro construction, extensive matrix manipulation and programming (PROC IML) and graphics procedures. Other quantitative programming environments (e.g. S-Plus, R) are considered for constructing specialized statistical analysis functions and graphical displays. Statistical computing topics, such as random number generation, randomization tests and Monte Carlo simulation, will be used to illustrate these programming ideas. Course Objectives: Develop programming and computing skills to address data analysis problems using statistical programming tools. 1 Texts: Required: [DS] Delwiche LD and Slaughter SJ. 2003. The Little SAS Book: A Primer, 3rd edition. SAS Institute. Cary, NC ISBN 1-59047-333-7 Recommended [CP] Cody R and Pass R. 1995. SAS Programming by Example. SAS Institute. Cary, NC ISBN 1-55544-681-7 [C] Cody R 2004. SAS Functions by Example. SAS Institute. Cary, NC ISBN 1-59047-378-7 - lots of great examples – worth browsing to see what you can do with functions in SAS [KO] Krause A and Olson M. 2004 The basics of S and S-Plus. Springer-Verlag, New York, NY ISBN 0-387-95456-2 Other resources: SAS: www.sas.com support.sas.com SAS doc www.muohio.edu/quantapps via MU http://www.units.muohio.edu/doc/sassystem/sasonlinedocv8/sasdoc/sashtml/main.htm http://support.sas.com/onlinedoc/912/docMainpage.jsp S-Plus www.insightful.com/products/splus/default.asp http://www.insightful.com/support/doc_splus_win.asp R www.r-project.org Grading: Homework and projects will contribute to the final grade. Homework will contribute 70% of the grade while a mid-term project report and a final project report will each contribute a total of 30% to the final grade. Homework will be typed on a computer with appropriate output included and annotated. It is expected that programs will be internally documented with adequate amounts of commenting. * STA 502 Project: Students enrolled in STA 502 will be required to complete a larger scale 2 simulation study or a large data management/analysis project. Statistics graduate students should use this as an opportunity to begin exploring opportunities for a Plan B project. Feel free to discuss possible projects with other faculty or me. * Homework must be in my mailbox by 4 p.m. on the assigned due date in order to be considered. 3 Other dates of interest: Sept. 6 LABOR DAY, no classes. Sept. 7 Monday/Tuesday class exchange day (Monday classes meet). Sept. 14 Last day to drop a class without a grade. Oct. 12 Last day to drop a course with a grade of W or change to audit. Oct. 15 Mid-term break Nov. 1 No class/No office hours (NIEHS/SBB review meeting) – reschedule? Nov. 10 No class/No office hours (NA/NRC/SEG review meeting) – reschedule? Nov. 24-28 Thanksgiving break Dec. 10 Classes end Question: Can you can meet on Tuesdays or Thursdays at 8 or 9 if we need to make up classes? Tentative course outline week Tentative topics [DS] [CP] [KO] 1 BASIC CONCEPTS 1,2 1 n/a *Review basic concepts of statistical computing and research data management * Introduce SAS data sets * Review the form of SAS Statements and SAS names * Introduce SAS procedures * Review the structure of SAS programs * Describe SAS data libraries and what they can contain * Show documenting SAS programs using comments * Illustrate running SAS programs and basic debugging 4 2 USING SAS PROCEDURES 4 * Introduce the idea of SAS system options * Briefly review statements that can be used with most procedures (BY, WHERE, TITLE, FOOTNOTE, LABEL, FORMAT) 9, 10, 12, 13 n/a * PROC CONTENTS for describing a data set * PROC PRINT for listing the observations in a data set * PROC CHART and PROC PLOT for producing low resolution graphs * PROC FREQ for one-way frequency tables and n-way crosstabulations * PROC UNIVARIATE for descriptive statistics and distributional information * PROC MEANS for descriptive statistics * PROC SORT for sorting a data set * SAS documentation and the online help system 3 REPORT WRITING 4,5 n/a 8 n/a * Introduce the Output Delivery System (ODS) for customizing procedure output * PROC TABULATE for producing nicely-formatted tables 4 AN INTRODUCTION TO STATISTICAL MODELING * PROC REG for linear modeling (a very basic introduction) * PROC GLM for anova models 5 HIGH-RESOLUTION GRAPHICS AND FORMATS * Introduce concepts related to high-resolution graphs * PROC GCHART and PROC GPLOT for producing highresolution graphs * SAS-supplied formats and PROC FORMAT for user-defined formats 5 n/a 6 TRANSFORMING SAS DATA SETS 3 n/a 4 n/a * Creating SAS data sets with DATA steps: flow of execution, including the program data vector * Creating variables in DATA steps with assignment statements * Statements: DATA, SET, OUTPUT, RETURN, WHERE, IF, DROP, KEEP, LENGTH * Subsetting observations and variables * Using SAS functions and operators * Working with SAS date values (also time and date-time) * Introduction to missing values 7 SAS PROGRAMMING * Declarative vs. executables statements * Statements: RETAIN, RENAME, LABEL, FORMAT, SUM * Using formats in DATA steps * Conditional execution * DO groups * Arrays * More on missing values 8 COMBINING AND MANAGING SAS DATA SETS * SET statement for concatenation and interleaving * MERGE statement for joining observations * UPDATE statement for updating a master file (maybe) * Special variables: IN, END, FIRST, and LAST * Creating multiple data sets in one DATA step * Reshaping data sets * Managing data sets using PROC COPY and PROC DATASETS * Transporting data sets between hosts 6 6 3 n/a 9 WRITING EXTERNAL FILES 9 n/a 7 n/a * Statements: FILE, PUT * Using DATA _NULL_ * PUT function * Creating customized reports using DATA setps 10 MACRO LANGUAGE * Why use macros? * Macro variables – system-defined and user-defined * Macros * Macro parameters * Macro functions * Conditional execution and DO loops * CALL SYMPUT 11 SAS/IML Programming n/a * Basic matrix concepts: rows, columns, scalars * matrix operators * subscripting * matrix functions * creating matrices from data sets and vice versa * sample applications 12-15 TOPICS IN STATISTICAL PROGRAMMING (varies) * Introduction to quantitative programming in S-Plus (objects – vectors, lists, matrices, dataframes; reading data [scan, read.table, sas.get]; summarizing data sets [mean, var, summary, table]; graphical displays [plot, pairs, trellis displays]; writing functions. R/S-Plus: Intro. & GUI (S-Plus) 1,2,3 R/S-Plus: Data structures 4 R/S-Plus: Basics & Trellis 5,6 R/S-Plus: Data description 7 R/S-Plus: Modeling 8 7 R/S-Plus: Programming & Functions 9 R/S-Plus: I/O 11 8