Quant_Chapter_99_appendix_c

advertisement
Appendix C: SAS Software
Uses of SAS
 CRM
 datamining
 data warehousing




linear programming
forecasting
econometrics
nonlinear parameter estimation
 simulation
 marketing models
 statistical analysis
Data Types SAS Can Deal with
 panel data
 relational databases
 scanner data
 Web log data
 questionnaires
Ideal When You Are …
 transforming
 manipulating
Mathematical
Marketing
 massaging
 sorting
 merging
 lookups
 reporting
Slide C.1
SAS
Two Types of SAS Routines
 DATA Steps
•
•
•
•
•
Read and Write Data
Create a SAS dataset
Manipulate and Transform Data
Open-Ended - Procedural Language
Presence of INPUT statement creates a Loop
 PROC Steps
• Analyze Data
• Canned or Preprogrammed Input and Output
Mathematical
Marketing
Slide C.2
SAS
A Simple Example
data my_study ;
input id gender $ green recycle ;
cards ;
001
m
4
2
002
m
3
1
003
f
3
2
•••
•••
•••
•••
;
proc reg data=my_study ;
class gender ;
model recycle = green gender ;
Mathematical
Marketing
Slide C.3
SAS
The Sequence Depends on the Need
data step to read in scanner data;
data step to read in panel data ;
data step to merge scanner and panel records ;
data step to change the level of analysis to the household ;
proc step to create covariance matrix ;
data step to write covariance matrix in LISREL compatable format ;
Mathematical
Marketing
Slide C.4
SAS
The INPUT Statement - Character Data
 List input
$ after a variable - character var
input last_name $ first_name $ initial $ ;
 Formatted input
$w. after a variable
input last_name $22. first_name $22. initial $1.
 Column input
$ start-column - end-column
input last_name $ 1 - 22 first_name $ 23 - 44 initial $ 45 ;
Mathematical
Marketing
Slide C.5
SAS
The INPUT Statement - Numeric Data
 List input
input score_1 score_2 score_3 ;
 Formatted input
w.d (field width and number of digits after an implied decimal point) after a variable
input score_1 $10. score_2 $10. score_3 10.
 Column input
$ start-column - end-column
input score_1 1 - 10 score_2 11 - 20 score_3 21 - 30 ;
Mathematical
Marketing
Slide C.6
SAS
Grouped INPUT Statements
input (var1-var3) (10. 10. 10.) ;
input (var1-var3) (3*10.) ;
input (var1-var3) (10.) ;
input (name var1-var3) ($10. 3*5.1) ;
Mathematical
Marketing
Slide C.7
SAS
The Column Pointer in the INPUT Statement
input @3 var1 10. ;
input more @ ;
if more then input @15 x1 x2 ;
input @12 x1 5. +3 x2 ;
Mathematical
Marketing
Slide C.8
SAS
Documenting INPUT Statements
input
Mathematical
Marketing
@4
@9
@20
@20
green1
green2
aware1
aware2
4.
4.
5.
5. ;
/*
/*
/*
/*
greeness scale first item
greeness scale 2nd item
awareness scale first item
awareness scale 2nd item
*/
*/
*/
*/
Slide C.9
SAS
The Line Pointer
input x1 x2 x3 / x4 x4 x6 ;
input x1 x2 x3 #2 x4 x5 x6 ;
input
#2
Mathematical
Marketing
x1 x2 x3
x4 x5 x6 ;
Slide C.10
SAS
The PUT Statement
put x1 x2 x3 @
input x4 ;
put x4 ;
put _all_ ;
put a= b= ;
;
put x1 #2 x2 ;
put _infile_ ;
put x1 / x2 ;
put _page_ ;
col1 = 22 ; col2 = 14 ;
put @col1 var245 @col2 var246 ;
Mathematical
Marketing
Slide C.11
SAS
Copying Raw Data
infile in ′c:\old.data′ ;
file
out ′c:\new.data′ ;
data _null_ ;
infile in ;
outfile out ;
input ;
put _infile_ ;
Mathematical
Marketing
Slide C.12
SAS
SAS Constants
'21Dec1981'D
'Charles F. Hofacker'
492992.1223
Mathematical
Marketing
Slide C.13
SAS
Assignment Statement
x = a + b ;
y = x / 2. ;
prob = 1 - exp(-z**2/2) ;
Mathematical
Marketing
Slide C.14
SAS
The SAS Array Statement
array y {20} y1-y20 ;
do i = 1 to 20 ;
y{i} = 11 - y{i} ;
end ;
Mathematical
Marketing
Slide C.15
SAS
The Sum Statement
variable+expression ;
retain variable ;
variable = variable + expression ;
n+1 ;
cumulated + x ;
Mathematical
Marketing
Slide C.16
SAS
IF Statement
if a >= 45 then a = 45 ;
if 0 < age < 1 then age = 1 ;
if a = 2 or b = 3 then c = 1 ;
if a = 2 and b = 3 then c = 1 ;
if major = "FIN" ;
if major = "FIN" then do ;
a = 1 ;
b = 2 ;
end ;
Mathematical
Marketing
Slide C.17
SAS
More IF Statement Expressions
name ne 'smith'
name ~= 'smith'
x eq 1 or x eq 2
x=1 | x=2
then etc ;
if
a <= b | a >= c
a le b or a ge c
a1 and a2 or a3
(a1 and a2) or a3
Mathematical
Marketing
Slide C.18
SAS
Concatenating Datasets Sequentially
first:
second:
id
1
2
3
id
4
5
6
x
2
1
3
y
3
2
1
x
3
2
1
y
2
1
1
data both ;
set first second ;
both:
id
1
2
3
4
5
6
Mathematical
Marketing
x
2
1
3
3
2
1
y
3
2
1
2
1
1
Slide C.19
SAS
Interleaving Two Datasets
proc sort data=store1 ;
by date ;
proc sort data=store2 ;
by date ;
data both ;
set store1 store2 ;
by date ;
Mathematical
Marketing
Slide C.20
SAS
Concatenating Datasets Horizontally
left:
id
1
2
3
y1
2
1
3
right:
y2
3
2
1
id x1 x2
1
3 2
2
2 1
3
1 1
data both ;
merge left right ;
both:
id
1
2
3
Mathematical
Marketing
y1
2
1
3
y2
3
2
1
x1
3
2
1
x2
2
1
1
Slide C.21
SAS
Table LookUp
table:
database:
part desc
0011 hammer
0012 nail
0013 bow
id part
1
0011
2
0011
3
0013
proc sort data=database out=sorted
by part ;
data both ;
merge table sorted ;
by part ;
both:
id
1
2
3
Mathematical
Marketing
part desc
0011 hammer
0011 hammer
0013 bow
The last observations is repeated if one of the input data sets is smaller
Slide C.22
SAS
Update
master:
transaction:
part desc
0011 hammer
0012 nail
0013 bow
Part desc
0011 jackhammer
data new_master ;
update master transaction ;
by part ;
new_master:
Mathematical
Marketing
part desc
0011 jackhammer
0012 nail
0013 bow
Slide C.23
SAS
Changing the Level of Analysis 1
Subject
A
A
A
B
B
B
Time Score
1
A1
2
A2
3
A3
1
B1
2
B2
3
B3
Subject Score1 Score2 Score3
A
A1
A2
A3
B
B1
B2
B3
Mathematical
Marketing
Before
After
Slide C.24
SAS
Changing the Level of Analysis 1
data after ;
keep subject score1 score2 score3 ;
retain score1 score2 ;
set before ;
if time=1 then score1 = score ;
else if time=2 then score2 = score ;
else if time=3 then do ;
score3 = score ;
output ;
end ;
Mathematical
Marketing
Slide C.25
SAS
Changing the Level of Analysis 2
Day
1
1
1
2
2
2
Day
1
2
Mathematical
Marketing
Score
12
11
13
14
10
9
Student
A
B
C
A
B
C
Highest
13
14
Student
C
A
Before
After
Slide C.26
SAS
Changing the Level of Analysis 2
FIRST. and LAST. Variable Modifiers
proc sort data=log ;
by day ;
data find_highest ;
retain hightest ;
drop score ;
set log ;
by day ;
if first.day then highest=. ;
if score > highest then highest = score ;
if lastday then output ;
Mathematical
Marketing
Slide C.27
SAS
The KEEP and DROP Statements
keep a b f h ;
drop x1-x99 ;
data a(keep = a1 a2) b(keep = b1 b2) ;
set x ;
if blah then output a ;
else output b ;
Mathematical
Marketing
Slide C.28
SAS
Changing the Level of Analysis 3
Spreading Out an Observation
Subject Score1 Score2 Score3
A
A1
A2
A3
B
B1
B2
B3
Subject
A
A
A
B
B
B
Mathematical
Marketing
Time Score
1
A1
2
A2
3
A3
1
B1
2
B2
3
B3
Before
After
Slide C.29
SAS
Changing the Level of Analysis 3 – SAS Code
data spread ;
drop score1 score2 score3 ;
set tight ;
time = 1 ; score = score1 ; output ;
time = 2 ; score = score2 ; output ;
time = 3 ; score = score3 ; output ;
Mathematical
Marketing
Slide C.30
SAS
Use of the IN= Dataset Indicator
data new ;
set old1 (in=from_old1)
old2 (in=from_old2) ;
if from_old1 then … ;
if from_old2 then … ;
Mathematical
Marketing
Slide C.31
SAS
Proc Summary for Aggregation
proc summary data=raw_purchases ;
by household ;
class brand ;
var x1 x2 x3 x4 x5 ;
output out=household mean=overall ;
Mathematical
Marketing
Slide C.32
SAS
Using SAS for Simulations
Simulation
Loop
data monte_carlo ;
keep y1 - y4 ;
array y{4} y1 - y4 ;
array loading{4} l1 - l4 ;
array unique{4} u1 - u4 ;
l1 = 1 ; l2 = .5 ; l3 = .5 ; l4 = .5 ;
u1 = .2 ; u2 = .2 ; u3 = .2 ; u4 = .2 ;
do subject = 1 to 100 ;
eta = rannor(1921) ;
do j = 1 to 4 ;
y{j} = eta*loading{j} + unique{j}*rannor(2917) ;
end ;
output ;
end ;
proc calis data=monte_carlo ;
etc. ;
Mathematical
Marketing
Slide C.33
SAS
External Data Sets and Windows/Vista
filename trans 'C:\Documents\june\transactions.data' ;
libname clv
'C:\Documents\customer_projects\' ;
...
data clv.june ;
infile trans ;
input id 3. purch 2. day 3. month $ ;
Mathematical
Marketing
Slide C.34
SAS
Download