STA402-syllabus-f04-24aug04

advertisement
STA 402/502 – Statistical Programming
Fall 2004
Course (Section)
STA 402/502 (A)
Meeting Time:
900-950 MWF (plus other make-up times to be
arranged in consultation with students)
Meeting Location
260 Bachelor Hall
Prerequisites:
STA 401/501; STA 671; or permission of the
instructor
Professor
Dr. John Bailer
E-mail:
baileraj@muohio.edu
URL:
http://www.users.muohio.edu/~baileraj
292 Bachelor Hall (529-3538)
Office (phone)
369C Upham Hall (529-2648)
FAX: 529-1493
Office Hours
10:00 - 11:00 MWF
(other hours by appointment — don’t be shy!)
Purpose of Course:
To introduce the use of computers to process and analyze data. Techniques and strategies for
managing, manipulating and analyzing data are discussed. Emphasis is on the use of the SAS
system. SAS data steps including infile, input, merge, set, looping structures, conditional
execution (if-then), etc. are presented. SAS mathematical, statistical and data functions are
discussed along with discussion of macro construction, extensive matrix manipulation and
programming (PROC IML) and graphics procedures. Other quantitative programming
environments (e.g. S-Plus, R) are considered for constructing specialized statistical analysis
functions and graphical displays. Statistical computing topics, such as random number
generation, randomization tests and Monte Carlo simulation, will be used to illustrate these
programming ideas.
Course Objectives:
Develop programming and computing skills to address data analysis problems using statistical
programming tools.
1
Texts:
Required:
[DS] Delwiche LD and Slaughter SJ. 2003. The Little SAS Book: A Primer, 3rd edition. SAS
Institute. Cary, NC ISBN 1-59047-333-7
Recommended
[CP] Cody R and Pass R. 1995. SAS Programming by Example. SAS Institute. Cary, NC ISBN
1-55544-681-7
[C] Cody R 2004. SAS Functions by Example. SAS Institute. Cary, NC ISBN 1-59047-378-7
- lots of great examples – worth browsing to see what you can do with functions in SAS
[KO] Krause A and Olson M. 2004 The basics of S and S-Plus. Springer-Verlag, New York,
NY
ISBN 0-387-95456-2
Other resources:
SAS:
www.sas.com
support.sas.com
SAS doc www.muohio.edu/quantapps
via MU http://www.units.muohio.edu/doc/sassystem/sasonlinedocv8/sasdoc/sashtml/main.htm
http://support.sas.com/onlinedoc/912/docMainpage.jsp
S-Plus
www.insightful.com/products/splus/default.asp
http://www.insightful.com/support/doc_splus_win.asp
R
www.r-project.org
Grading:
Homework and projects will contribute to the final grade. Homework will contribute 70% of the
grade while a mid-term project report and a final project report will each contribute a total of
30% to the final grade. Homework will be typed on a computer with appropriate output included
and annotated. It is expected that programs will be internally documented with adequate
amounts of commenting.
*
STA 502 Project: Students enrolled in STA 502 will be required to complete a larger scale
2
simulation study or a large data management/analysis project. Statistics graduate students should
use this as an opportunity to begin exploring opportunities for a Plan B project. Feel free to
discuss possible projects with other faculty or me.
*
Homework must be in my mailbox by 4 p.m. on the assigned due date in order to be considered.
3
Other dates of interest:
Sept. 6
LABOR DAY, no classes.
Sept. 7
Monday/Tuesday class exchange day (Monday
classes meet).
Sept. 14
Last day to drop a class without a grade.
Oct. 12
Last day to drop a course with a grade of W or
change to audit.
Oct. 15
Mid-term break
Nov. 1
No class/No office hours (NIEHS/SBB review
meeting) – reschedule?
Nov. 10
No class/No office hours (NA/NRC/SEG
review meeting) – reschedule?
Nov. 24-28
Thanksgiving break
Dec. 10
Classes end
Question: Can you can meet on Tuesdays or Thursdays at 8 or 9 if we need to make up classes?
Tentative course outline
week
Tentative topics
[DS]
[CP]
[KO]
1
BASIC CONCEPTS
1,2
1
n/a
*Review basic concepts of statistical computing and research
data management
* Introduce SAS data sets
* Review the form of SAS Statements and SAS names
* Introduce SAS procedures
* Review the structure of SAS programs
* Describe SAS data libraries and what they can contain
* Show documenting SAS programs using comments
* Illustrate running SAS programs and basic debugging
4
2
USING SAS PROCEDURES
4
* Introduce the idea of SAS system options
* Briefly review statements that can be used with most
procedures (BY, WHERE, TITLE, FOOTNOTE, LABEL,
FORMAT)
9,
10,
12,
13
n/a
* PROC CONTENTS for describing a data set
* PROC PRINT for listing the observations in a data set
* PROC CHART and PROC PLOT for producing low resolution
graphs
* PROC FREQ for one-way frequency tables and n-way crosstabulations
* PROC UNIVARIATE for descriptive statistics and
distributional information
* PROC MEANS for descriptive statistics
* PROC SORT for sorting a data set
* SAS documentation and the online help system
3
REPORT WRITING
4,5
n/a
8
n/a
* Introduce the Output Delivery System (ODS) for customizing
procedure output
* PROC TABULATE for producing nicely-formatted tables
4
AN INTRODUCTION TO STATISTICAL MODELING
* PROC REG for linear modeling (a very basic introduction)
* PROC GLM for anova models
5
HIGH-RESOLUTION GRAPHICS AND FORMATS
* Introduce concepts related to high-resolution graphs
* PROC GCHART and PROC GPLOT for producing highresolution graphs
* SAS-supplied formats and PROC FORMAT for user-defined
formats
5
n/a
6
TRANSFORMING SAS DATA SETS
3
n/a
4
n/a
* Creating SAS data sets with DATA steps: flow of execution,
including the program data vector
* Creating variables in DATA steps with assignment statements
* Statements: DATA, SET, OUTPUT, RETURN, WHERE, IF,
DROP, KEEP, LENGTH
* Subsetting observations and variables
* Using SAS functions and operators
* Working with SAS date values (also time and date-time)
* Introduction to missing values
7
SAS PROGRAMMING
* Declarative vs. executables statements
* Statements: RETAIN, RENAME, LABEL, FORMAT, SUM
* Using formats in DATA steps
* Conditional execution
* DO groups
* Arrays
* More on missing values
8
COMBINING AND MANAGING SAS DATA SETS
* SET statement for concatenation and interleaving
* MERGE statement for joining observations
* UPDATE statement for updating a master file (maybe)
* Special variables: IN, END, FIRST, and LAST
* Creating multiple data sets in one DATA step
* Reshaping data sets
* Managing data sets using PROC COPY and PROC
DATASETS
* Transporting data sets between hosts
6
6
3
n/a
9
WRITING EXTERNAL FILES
9
n/a
7
n/a
* Statements: FILE, PUT
* Using DATA _NULL_
* PUT function
* Creating customized reports using DATA setps
10
MACRO LANGUAGE
* Why use macros?
* Macro variables – system-defined and user-defined
* Macros
* Macro parameters
* Macro functions
* Conditional execution and DO loops
* CALL SYMPUT
11
SAS/IML Programming
n/a
* Basic matrix concepts: rows, columns, scalars
* matrix operators
* subscripting
* matrix functions
* creating matrices from data sets and vice versa
* sample applications
12-15
TOPICS IN STATISTICAL PROGRAMMING (varies)
* Introduction to quantitative programming in S-Plus (objects –
vectors, lists, matrices, dataframes; reading data [scan,
read.table, sas.get]; summarizing data sets [mean, var, summary,
table]; graphical displays [plot, pairs, trellis displays]; writing
functions.
R/S-Plus: Intro. & GUI (S-Plus)
1,2,3
R/S-Plus: Data structures
4
R/S-Plus: Basics & Trellis
5,6
R/S-Plus: Data description
7
R/S-Plus: Modeling
8
7
R/S-Plus: Programming & Functions
9
R/S-Plus: I/O
11
8
Download