lab1-2014_readingdata.doc

advertisement
Stat 512 Lab 1: Fall 2014
Let us start with two problems and see how we can use software to deal with it. In this
class we will be using SAS. SAS is a programming language and it is windows based.
However, it is NOT a click and drag program, but it actually requires you to write a
program and submit it.
I think let’s look at how a simple problem is done is SAS.
1. A large amount of alcohol is known to make people take longer to respond o a stimulus. To
investigate the effects of a small amount of alcohol, reaction time (time elapsed before
responding to a stimulus) for seven individuals were recorded before and after they consumed 2
ounces of 90 proof alcohol. Does the data suggest that consumption of 2 ounces of alcohol makes
average time to react larger? Report your p-value:
Data:
Subject
1
2
3
4
5
6
7
before
.6
.8
.4
.7
.8
.8
.7
after
.7
.9
.6
.9
.9
.8
.9
First we need to enter the data.
There are many ways of entering data in SAS. The most important ones are:
1. Physically typing in the data
2. Import a data set
1. Physically entering data:
data example1;
input b a ;
cards;
.6 .7
.8 .9
.4 .6
.7 .9
.8 .9
.8 .8
.7 .9
;
Lets say I had the following data saved as an excel file named lab1.xls
b
a
0.6
0.8
0.4
0.7
0.8
0.8
0.7
0.7
0.9
0.6
0.9
0.9
0.8
0.9
2. Importing an Excel File:
*IMPORTING DATA;
PROC IMPORT OUT= WORK.lab1
DATAFILE= "C:\Users\dasgupta\Desktop\classes\512st\lab1data
.xls"
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
Problems to look out for

If you read a file that is wider than 80 columns, you may need to use the lrecl=
parameter on the infile statement.
Example2:
5. Suppose you are the personnel manager for a company and you suspect a difference in
the mean length of time lost due to sickness for two types of employees: those who work at
night versus those who work during the day. Particularly, you suspect that the mean for the
night shift exceeds the mean for the day shift. To check your theory, you randomly sample
the records for ten employees for each shift category and record the number of days lost
due to sickness within the past year, the employees are paired by age. Test appropriate
hypothesis at =.05 and report your p-value.
AGEGrp 1


Night|
Day |
21
13
2
3
4
5
6
7
8
9
10
10
5
14
16
33
0
7
7
2
18
19
17
6
3
4
24
12
1
Enter your data in SAS by physically typing it in.
Once you have the data in, we need to think about analysis.
However, often a big question is graphics.
So lets look at the simplest graphics option in SAS. PLOTTING data
1. SAS does line-printer quality plots under Proc Plot or
2. High Resolution plots from Proc Gplot
You can use one or the other for any plotting you do in class.
Example for Proc Plot:
proc plot data=dataname;
plot yvar*xvar/hpos=somenumber vpos=somenumber;
run;
Example of Gplot
Proc gplot data=dataname;
Symbol1 v=plus c=black;
Symbol2 v=u c=black;
Symbol3 v=l c=black;
Plot var1*var2 var3*var2 var4*var2 / overlay;
There are various options here, and you can get quite fancy with Gplots.
For example 1, we do this as follows:
proc gplot data=example1;
plot b*a;
run;
To analyze the data we need to use a PROCEDURE. First let us get simple descriptive
measures for the data. Procedures in SAS are called PROCS. To look at means and
variances we could use UNIVARIATE or CAPABILITY.
proc capability data=example1;
var a b;
histogram;
run;
proc univariate data=example1;
var a b;
run;
Here example 1 would be a paired t. So this is the code:
proc ttest data=example1;
paired a*b;
run;
Download