lab1-2016_readingdata.doc

advertisement
Stat 530 Lab 1: Spring 2016
Part 1: SAS
Let us start with two problems and see how we can use software to deal with it. In this
class we will be using SAS as the primary software. SAS is a programming language and
it is windows based. However, it is NOT a click and drag program, but it actually
requires you to write a program and submit it.
I think let’s look at how a simple problem is done is SAS.
Consider this data set, here we have 25 students who wrote down their
Height, age, shoesize, eyecolor and haircolor.
So, let’s see how we can read this data set into SAS.
id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
height
age
shoe
eyecolor
62
18
6.5 brown
69
19
11 hazel
71
20
13 brown
69
20
6.5 hazel
66
20
8 blue
63
18
7.5 brown
66
20
9.5 blue
65
20
8.5 blue
67
19
9.5 brown
67
23
8.5 brown
67
19
9.5 green
69 .
8.5 hazel
66
20
10 blue
70
20
9.5 hazel
69
20
9.5 hazel
72
20
11 blue
70
19
10.5 blue
71
24
12.5 blue
63
19
8 green
65
44
7 blue
68
22
9.5 hazel
69
20
10 hazel
70
35
10 brown
70
22
10 blue
64
19
7.5 blue
hair
blonde
auburn
brown
blonde
blonde
brown
blonde
blonde
brown
brown
brown
blonde
blonde
brown
brown
blonde
brown
brown
blonde
brown
brown
brown
brown
brown
brown
There are many ways of entering data in SAS. The most important ones are:
1. Physically typing in the data
2. Import a data set
1. Physically entering data:
data example1;
input id height age shoesize eyecolor $ haircolor $;
cards;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
;
62
69
71
69
66
63
66
65
67
67
67
69
66
70
69
72
70
71
63
65
68
69
70
70
64
18
19
20
20
20
18
20
20
19
23
19
.
20
20
20
20
19
24
19
44
22
20
35
22
19
6.5
11
13
6.5
8
7.5
9.5
8.5
9.5
8.5
9.5
8.5
10
9.5
9.5
11
10.5
12.5
8
7
9.5
10
10
10
7.5
brown
hazel
brown
hazel
blue
brown
blue
blue
brown
brown
green
hazel
blue
hazel
hazel
blue
blue
blue
green
blue
hazel
hazel
brown
blue
blue
blonde
auburn
brown
blonde
blonde
brown
blonde
blonde
brown
brown
brown
blonde
blonde
brown
brown
blonde
brown
brown
blonde
brown
brown
brown
brown
brown
brown
Dollar sign
for
categorical
varaible
Lets say I had the following data saved as an excel file named class1.xls or class1.csv
2. Importing an Excel File:
*IMPORTING DATA;
PROC IMPORT OUT= WORK.lab1
DATAFILE= “YOUR PATH WHERE FILE IS SAVED”
DBMS=EXCEL REPLACE;
RANGE="Sheet1$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
Problems to look out for

If you read a file that is wider than 80 columns, you may need to use the lrecl=
parameter on the infile statement.
Once you have the data in, we need to think about analysis.
However, often a big question is graphics and summary:
So let’s look at the simplest graphics option in SAS. PLOTTING data
1. SAS does line-printer quality plots under Proc Plot or
2. High Resolution plots from Proc Gplot
You can use one or the other for any plotting you do in class.
Example for Proc Plot:
proc plot data=dataname;
plot yvar*xvar/hpos=somenumber vpos=somenumber;
run;
Example of Gplot
Proc gplot data=dataname;
Symbol1 v=plus c=black;
Symbol2 v=u c=black;
Symbol3 v=l c=black;
Plot var1*var2 var3*var2 var4*var2 / overlay;
There are various options here, and you can get quite fancy with Gplots.
For example 1, we do this as follows:
proc gplot data=example1;
plot shoesize*height;
run;
For the categorical data we might want to do bar graphs or pie charts.
proc gchart data=example1;
pie haircolor eyecolor;
run;
proc gchart data=example1;
pie3d haircolor eyecolor/
explode="brown";
run;
proc gchart data=example1;
hbar3d haircolor eyecolor;
run;
To summarize the data we need to use a PROCEDURE. First let us get simple
descriptive measures for the data. Procedures in SAS are called PROCS. To look at
means and variances we could use UNIVARIATE or CAPABILITY.
proc means data=example1;
var height shoesize age;
run;
proc univariate data=example1;
var height shoesize age;
run;
proc capability data=example1;
var height shoesize age;
run;
For simple descriptive measures for the categorical data we could use FREQ.
proc freq data=example1;
table haircolor eyecolor;
run;
proc freq data=example1;
table haircolor*eyecolor;
run;
Part 2:
For the ones among us who are R-inclined:
Data is R can be of various types.
1. Vectors: one dimensional data set.
Example:
x1=c(1,2,3,5,7)
x2=c(2,7,8,4,1)
2. Matrix: 2 dimensional data, has rows and columns.
Example: m1=cbind(x1,x2)
OR
m2=matrix(c(1,2,3,5,7,2,7,8,4,1),nrow=5,ncol=2)
3. Data frames (the one we will use the most): idea of a data set with various elements.
mydata= data.frame(x1,x2,m1,m2)
4. Arrays: collection of matrices, multidimensional matrices
5. List: a collection of data frames, matrices, vectors etc. Variables can be different sizes
and not all numerical.
Inputting data from a file: I will show you how to read from a csv file (which will be
common for us).
Reading data from a csv file.
Put the data in the correct directory and make sure that’s your work directory. I
created a directory called myRfolder in my C drive. I will try to use that as my
default directory. I put in a data set called classdata.csv (This is a class data dealing
with the students height, age, shoesize, eye and hair color).
Use the commands:
>setwd("/myRfolder")
>mydata<-read.table("classdata.csv",header=TRUE,sep=",",row.names="id",na.strings="
")
> mydata
#to print the read dataset.
The data is saved in R as a dataframe “mydata”. It’s a temporary data set and will be
lost when we get out of R.
R looks at everything as a matrix. So if we want to get a summary measure for height
all we have to do is refer to it as mydata[ ,1] referring to the first column of the data
or can use the $ sign to say mydata$height
Simple summaries:
>summary(mydata[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
62.00 66.00 68.00 67.52 70.00 72.00
> summary(mydata$height)
Min. 1st Qu. Median Mean 3rd Qu. Max.
62.00 66.00 68.00 67.52 70.00 72.00
> mean(mydata$height)
[1] 67.52
> sd(mydata$height)
[1] 2.77068
SIMPLE GRAPHICS
> barplot(table(mydata$hair))
> pie(table(mydata$hair))
> hist(mydata$shoe)
> plot(mydata$shoe,mydata$height)
> table(mydata$hair,mydata$eye)
Download