dump a sas data set to a flat file to use it with any

advertisement
DUMP A SAS DATA SET TO A FLAT FILE
TO USE IT WITH ANY POSTALSOFT PRODUCTS
FOR BUSINESS AND MARKETING ANALYSIS
Sergey Sianissian, PreVision Marketing, Lincoln, MA
INTRODUCTION
The key to successful Database Marketing
programs lies in: segmenting customers based on
their detailed buying activity, to target the “right”
audiences for direct mail and other targeted
promotions. Nowhere is the rigorous use of data for
these applications more critical then for grocers,
who have transaction rich databases but razor thin
margins for targeted marketing. To implement
these complex, challenging targeting assignments
for grocers and their packaged good vendors,
advanced analytical software (such as SAS) is
often used. In addition, direct mail marketing
software (e.g., PostalSoft, Group1), is needed to
identify and parse names, titles and address data from
multiple input files, to correct and standardize
addresses, to eliminate duplicate records, etc. These
programs usually operate with files in ASCII or
DBF formats. That’s why when developing advanced
targeting applications, there is a common
activity to convert SAS data sets into flat files.
A number of techniques have been used to do this:
Kretzman (1992), Whitlock (1993) and Carpenter
(1998). To determine the data structure, most of
these techniques, like this article, use PROC
CONTENTS. The most unique features of the code
below are:
- it generates the SAS program, which really
converts the SAS data sets into ASCII flat files.
- it creates, along with a flat file, a FORMAT file
(Appendix 1), which is necessary for any
PostalSoft programs (ACE, Merge/Purge, etc.).
- it takes care of different presentations of dates in
a SAS file.
CONVERT SAS DATA SET TO A FLAT
FILE
/*******************************************/
/** This program runs PROC CONTENTS of **/
/** a SAS file and using its results generates **/
/** a program to create FLAT and FORMAT **/
/** files from this SAS data set. Later on it **/
/**
executes the generated program.
**/
/**************************************/
libname sasdirct
'D:\tape_oct98\sas data';
❶
%MACRO genrflat(sasdirct, filename,
flatfile);
proc contents data=&sasdirct..&filename
out=soderz(keep=name length label
format Npos) noprint;
run;
proc sort data=soderz;
by Npos;
run;
/** Calculate the record’s length and **/
/** create the MACRO variable
**/
/**
with its value
**/
data _NULL_;
set soderz(keep=length name npos format)
END=last;
by Npos;
retain sumlen;
if format=’DATE’ then length+2;
❷
sumlen+length;
if last then do;
sumlen=sumlen+1;
call symput(’lenrec’, sumlen);
end;
run;
data _NULL_;
set soderz END=eof;
by Npos;
retain posN;
Npos+1;
if format=’DATE’ then length+2;
/** Generate SAS program which
/**
would create FLAT files
%genrflat(sasdirct, test, D:\tape_oct98) run; ❽
******************************************;
**/
**/
file "&flatfile\CreateFlat&filename..sas";
if _N_=1 then do;
❸
put "filename FLAT
’&flatfile"’\’"FLAT&filename..txt’;" /
’data _NULL_;’ /
"set &sasdirct..&filename;" /
"length EOR $1;" /
"EOR=’X’;"
/
"file FLAT lrecl=&lenrec;" /
’put’ /
’@’ @2Npos
@20name @30’/** ’
length @38’**/’;
end;
if _N_ > 1 then do;
put ’@’ @2posN
❹
@20name @30’/** ’
length @38’**/’;
end;
if eof then do;
put ’@’ @2"&lenrec" @20’EOR’
@30'/** 1’ @38 ‘**/' /
';' /
'run;';
end;
posN=Npos+length;
❺
/**
/**
Create FORMAT file for
PostalSoft products
file "&flatfile\FLAT&filename..FMT";
put name +(-1) ',' length +(-1) ',c';
if eof then put 'EOR,3,b';
run;
**/
**/
❻
filename IN
"'&flatfile.\CreateFlat&filename..sas'"; ❼
%INCLUDE IN;
%mend genrflat;
❶ Specify path to the SAS data set.
❷ If in a SAS data set there are any “date”
variables of numeric type (maximum
length = 8), but format is ‘DATE9.’
(for example: 05OCT1997), it is possible to
miss the last byte (in this example “7”) in the
flat file. To avoid this, extend the variable’s
length.
❸ This do-loop creates a header of a generated
SAS program (Appendix 2).
❹ This do-loop creates the rest of a generated
SAS program.
❺ To get a ‘nice’ column specified flat file use
calculated variable (posN) instead of variable
“Npos” from PROC CONTENTS.
❻ For PostalSoft programs the End Of Record
(EOR) field should have a binary type.
❼ Call the generated SAS program.
❽ It should be 3 arguments for a MACRO
“genrflat”:
- path to the SAS data set
- name of a SAS data set
- destination to the flat file
To convert a SAS data set (1,000,000 records) to a
flat file takes approximately 2 minutes.
CONCLUSION
Presented is a SAS code, which converts a SAS
data set to a flat file and creates its FORMAT file.
Both files might be extensively used for business
and marketing analysis with direct marketing name
and address hygiene programs (e.g., PostalSoft,
Group1). The presented code takes into
consideration the different formats of “dates”
presentation in a SAS data sets.
ACKNOWLEDGMENTS
Song Jungdong contributed extensively to the
development of this paper. His support and
suggestions are greatly appreciated.
PreVision marketing is a database marketing
agency specializing in the development and
implementation of relationship marketing
strategies including comprehensive customer
loyalty, upgrade and acquisition programs.
PreVision provides strategic, analytic, creative
and mail production services along with support
in the selection and efficient use of the newest
database technologies.
AUTHOR CONTACT INFORMATION
Sergey Sianissian
PreVision Marketing, Inc
55 Old Bedford Road
Lincoln, MA 01773
Direct: (781) 259-5169
Fax: (781) 259-1548
E-mail: ssianissian@previsionmarketing.com
TRADEMARK INFORMATION
SAS is a registered trademark of SAS Institute Inc.
PostalSoft is a registered trademark of a product
name of a Firstlogic Inc.
APPENDIX 1
EXAMPLE OF A FORMAT FILE
DATEBRTH,10,c
ADDR1,30,c
CITY,20,c
STATE,2,c
ZIPFULL,10,c
LNAME,20,c
FNAME,15,c
EOR,3,b
APPENDIX 2
EXAMPLE OF A GENERATED SAS
PROGRAM
filename FLAT ’D:\tape_oct98\FLATtest.txt’;
data _NULL_;
set sasdirct.test;
length EOR $1;
EOR=’X’;
file FLAT lrecl=108;
put
@1
DATEBRTH
@11
ADDR1
@39
CITY
@59
STATE
@61
ZIPFULL
@71
LNAME
@91
FNAME
@108
EOR
;
run;
/** 10
/** 30
/** 20
/** 2
/** 10
/** 20
/** 15
/** 1
**/
**/
**/
**/
**/
**/
**/
**/
Download