Worst data I`ve ever seen

advertisement
worst, but still
importable data
I’ve ever seen
Arthur Tabachneck
Insurance Bureau of Canada
suppose you had the following excel file:
format: text
format: as shown
format: m/d/yyyy
format: d/m/yyyy
format: text
format: d-mon
format: text
SAS
Forum
Coder’s Corner
April 12, 2010
how the file got so bad:
members of a secretarial pool were
asked to enter the data, in Excel, while
they were covering the front desk
they (four different secretaries), obviously,
weren’t given sufficient instructions
their task was simply to enter some data,
which happened to include a date
SAS
Forum
Coder’s Corner
April 12, 2010
proc import can only be used if:
1
you license
SAS/Access Interface to PC File Formats
and
2
at least half of the relevant rows (based
on your system’s and SAS guessingrows
settings) are formatted as dates
or
3
SAS
you manually edit the spreadsheet
and/or change your guessing rows
settings so that condition #2 holds
Forum
Coder’s Corner
April 12, 2010
If proc import can be used, three steps are necessary
step 1: use mixed=no
SAS
Forum
Coder’s Corner
April 12, 2010
which will import date formatted cells
and assign missing values to the other cells
SAS
Forum
Coder’s Corner
April 12, 2010
step 2: use mixed=yes
which will import all cells as text
SAS
Forum
Coder’s Corner
April 12, 2010
step 3
merge the two files and use inputn to read missing dates
data want (drop=bdate);
set inputa;
set inputb (rename=(date=bdate));
if missing(date) then do;
options datestyle=dmy;
date=inputn(bdate, ‘anydtdte’, 20);
end;
if missing(date) then do;
date=inputn(catt(scan(bdate,2,’-’), scan(bdate,1,’-’),
scan(bdate,3,’-’)), ‘anydtdte’, 20);
end;
run;
SAS
Forum
Coder’s Corner
April 12, 2010
resulting in the following file
SAS
Forum
Coder’s Corner
April 12, 2010
however, if proc import can’t be used
or
if you simply want a better solution
SAS
Forum
Coder’s Corner
April 12, 2010
you can do it with DDE
step 1: set desired options and filename
options noxsync noxwait xmin;
filename sas2xl dde 'excel|system';
SAS
Forum
Coder’s Corner
April 12, 2010
data _null_;
Step 2: Open Excel
length fid rc start stop time 8;
fid=fopen('sas2xl','s');
if (fid le 0) then do;
rc=system('start excel');
start=datetime();
stop=start+10;
do while (fid le 0);
fid=fopen('sas2xl','s');
time=datetime();
if (time ge stop) then fid=1;
end;
end;
rc=fclose(fid);
run;
SAS
Forum
Coder’s Corner
April 12, 2010
Step 3: Open workbook and insert old-style macro sheet
data _null_;
file sas2xl;
put '[open("c:\worst data.xls")]';
run;
data _null_;
file sas2xl;
put '[workbook.next()]';
put '[workbook.insert(3)]';
run;
filename xlmacro dde 'excel|macro1!r1c1:r99c1‘
notab lrecl=200;
SAS
Forum
Coder’s Corner
April 12, 2010
data _null_; Step 4: Create and run Excel macro
file xlmacro;
put '=set.name("Tag",!$b$1)'; put '=formula("<>",Tag)';
put '=set.name("OldValue",!$c$1)';
put '=set.name("NewValue",!$b$2)';
put '=for.cell("CurrentCell",sheet1!$a$2:$a$99,true)';
put '=formula(get.cell(5,CurrentCell),OldValue)';
put '=formula("=concatenate(Tag,OldValue)",NewValue)';
put '=formula(NewValue, CurrentCell)'; put '=next()';
put '=halt(true)'; put '!dde_flush';
file sas2xl; put '[run("macro1!r1c1")]';
put '[workbook.activate("sheet1")]';
put ‘[error(false)]’;
put '[save.as(“c:\DateTest",6)]';
put '[quit()]';
run;
SAS
Forum
Coder’s Corner
April 12, 2010
data want (keep=date);
Step 5: Import the data
infile "c:\DateTest.csv" dsd dlm="," lrecl=32768 firstobs=2;
informat rawdate $20.;
input rawdate;
format date date9.;
rawdate=substr(rawdate,3);
if anyalpha(rawdate) then do;
options datestyle=dmy;
date=inputn (rawdate , 'anydtdte' , 20 );
if missing(Date) then do;
date=inputn(catt(scan(rawdate,2,'-'),scan(rawdate,1,'-'),
scan(rawdate,3,'-')),'anydtdte' , 20) ;
end;
end;
else Date=rawdate-21916;
run;
SAS
Forum
Coder’s Corner
April 12, 2010
and obtain the desired result
regardless of your system’s guessing rows setting
or how your data is arranged
SAS
Forum
Coder’s Corner
April 12, 2010
Author Contact Information
Your comments and questions are valued and encouraged.
Contact the author:
Dr. Arthur Tabachneck
Director, Data Management
Insurance Bureau of Canada
Toronto, Ontario L3T 5K9 Canada
atabachneck at ibc dot ca or
art297 at netscape dot net
SAS
Forum
Coder’s Corner
April 12, 2010
Key References
Microsoft Corporation. Function Reference Microsoft EXCEL Spreadsheet with
Business Graphics and Database: Version 4.0 for Apple® Macintosh® Series
or Windows™ Series. Document AB26298-0592, 1992.
Vyverman, K. Excel Exposed: Using Dynamic Data Exchange to Extract Metadata
from MS Excel Workbooks, SESUG 17, 2003, paper TU15, St. Pete Beach, FL
Vyverman, K. Re: How to flag special formatting from Excel in a SAS dataset.
SAS-L Post , 2002,
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0209a&L=sas-l&D=1&O=A&P=12088
Vyverman, K. Re: MS Excel column widths. SAS-L Post , 2002,
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0201b&L=sas-l&P=25268
Vyverman, K. Using Dynamic Data Exchange to Export Your SAS Data to MS Excel
– Against All ODS, Part I, SUGI 26, 2001, paper 190-27, Long Beach, CA.
SAS
Forum
Coder’s Corner
April 12, 2010
Download