Creating a Compact Columnar Output with PROC

advertisement
Creating a Compact Columnar
Output with PROC REPORT
Walter R. Young
Principal Clinical Programmer Analyst
Wyeth
Why create a columnar output from a data set.
For own use.
 To show others at a meeting.
 To put in a standardized regulatory report.
 To put in a publication.
The above are ranked in approximately
increasing effort required to create the report.
Author’s Opinion: It is the neatest, most
heuristic way to present a data set.

PROC CONTENTS for ECGTEST
57 Variables 28 Observations
Solution 1: Default PROC PRINT
Advantages:
 Easy and Neat for Narrow Data Set
 Can Use ID and VAR Statements.
Disadvantages:
 Virtually No Beautification Options
 Doesn’t Work for a Wide Data Set
 Wraps Neatly but One Can’t Control Wrapping
with the exception of increasing the page size.
 Wide Columns are Truncated.
Default PROC PRINT
Default PROC REPORT
Solution 2: Use a Default PROC REPORT
with NOWD (Required in Batch) Option
Advantages
 Output is columnar.
 Small number of variables fit on a page.
Disadvantages
 Rows are not identified for multiple pages.
 Spacing between columns is uneven.
 Column labels split unattractively.
 Column order isn’t optimum.
 Wide columns cause program to bomb.
PROC REPORT Defaults and Properties








PROC was designed to run interactively.
Labels are used as column headers.
Variables are in position order.
Spacing is 2 including before the first column.
Actual spacing rules are explainable but messy.
WRAP with NAMED options create messy output.
If all variables are numeric and none are specified
as DISPLAY, they are summed instead of listed.
MISSING option is needed to print all data rows.
If a variable name not in the data set, is listed in both the
COLUMN and DEFINE statements, no error will result.
PROC REPORT Justification Rules




Default is right for numeric - left for character.
Numerical values are right justified within formats
which are justified within the width.
Character values are justified in width with leading
blanks retained and trailing blanks eliminated.
Rules apply simultaneously to labels and values.
PROC REPORT SPLIT Character






The SPLIT character splits both labels and FLOW variables.
Choose a printable character(e.g., “~”, “|”, “\”) other than
the default “/” which is common in many entered texts.
If a word’s length in a flowed variable is greater than the
variable’s width, the word will split at that width.
To indent flowed text, insert a split character plus the
desired number of spaces and one at the end of the text.
If there is a split character in the flowed variable, words at
the end of the field will split randomly due to a SAS bug
which will be fixed in a future SAS version. To fix this one
either widens the field to eliminate non-indented flow or
writes a macro to insert split characters where desired.
If unprintable printer control characters exist in the flowed
variable, they must be removed. This is an uncommon problem
which can happen if data is coming from many sources.
Solution 3: Use ID Statement (introduced in 6.12)
With PROC REPORT





Observations are identified.
However, All Other Default Problems Exist.
Won’t Work if the Width of any Variable
Exceeds the Inherent PROC REPORT limit.
Plus the Width of the ID variables Plus the
Spaces Between Columns Exceeds Line Size.
(In this case the FLOW option must be used.)
Solution 4: PROC REPORT with Minimal Options
for a Narrow Data Set







Must Use a COLUMN Statement (Analogous to a
VAR statement in PROC PRINT).
Use a BREAK Statement for spaces between lines.
Use HEADLINE, HEADSKIP, BREAK Statement or
underline to separate labels from the observations.
Specify a Constant Spacing Between Columns.
Customize Labels in the DEFINE statements or Use
Variable Names (System NOLABEL option).
Possibly use PANEL option to minimize paper use.
The above gives you most of the features of using a
PUT statement formatting (DATA _NULL_).
Solution 5: Use PROC REPORT With All
Applicable Options for a Wide Data Set
If the width of a data set won’t fit within the line
size, one should make the output compact.
In a compact output, the maximum number of
observations of the variables should be made to
either fit on the width of a single page or on the
width of a minimum number of pages.
Author’s Opinion: Presenting the data in columns
on a single page width, neatly and informatively, is
more heuristic than presenting it on multiple pages.
To make the report compact








Make every reasonable effort to limit width to a single page.
Reduce the space between columns to one.
Drop space before 1st column (SPACING=0 in DEFINE).
Drop unnecessary variables from COLUMN statement.
Drop variables having the same value for all observations
and consider putting them in a title, footnote or legend.
Sort the data by sensible variables having a fair number
of rows for each combination in the BY statement and
use the BY in PROC REPORT.
For data sets wider than a single page, pick the minimum
# of ID variables to adequately identify all observations.
Balance the width of the non-ID variables across pages.
Use PROC FREQ to determine whether long variables
can be coded and describe the code in a legend.
To make the report compact








Don’t use the FLOW option unless necessary as it
increases the number of lines per observation. Also,
consider not using the SPLIT character in its label.
Don’t alter any variable if proofreading.
Eliminate variables which have a one to one
relationship with other variables.
Sensibly condense character variables.
Edit variables without eliminating their meaning.
Transfer meaning from a variable to its label.
Since formats can alter variable widths, apply them
prior to calculating column widths.
Use the STYLE attribute, some of the 6 font parameters
and ODS. While good for publications, this doesn’t
support a standardized line size and appearance.
For alphanumeric variables
Determine their maximum width in the data set.
 If a format increases this width, use that width.
 Consider removing any invariant prefixes or suffixes
 If the FLOW parameter is required, consider the line
size constraint, calculate the width plus spacing of
all other variables and:
For a single FLOW variable, use its maximum width.
For multiple FLOW variables, determine how to best
allocate their widths to minimize lines per observation.
 See if other data can be put on the added line(s) per
observation (e.g., concatenate visit date, SPLIT
character and visit name and use the FLOW option).

For numeric variables



Determine their range, maximum value and whether
they’re integer and then specify an appropriate
format (not the default BEST) and decimal point.
For date time variables, specify an appropriate
compact format (e.g., MMDDYY6.). Separate date
and time with DATEPART. If time is missing for all
observations remove it from the report.
If it has a format which transforms it into an
alphanumeric variable, apply the format and treat it
as though it were an alphanumeric variable.
To increase the attractiveness
of a compact report





Appropriately order the COLUMN statement variables.
Appropriately specify ORDER variables and the BREAK
statement (e.g., blank line between ID variables).
Use informative labels neatly spanned in the COLUMN
statement and appropriately split in the DEFINE.
Use tricks (e.g., unprintable character at end of label,
SPLIT character and blank at beginning of label, spacing
=0) to separately justify labels and values.
Use neat and informative titles, footnotes and/or legends.
If necessary, expand a label’s meaning in a legend.
20 Possible Variables for Compact Output
Condensing and Editing LVALC and SASNAME
Compact Output (15 Variable) Data Step
Compact Output
Final Compact Output
Compact Output PROC REPORT
Compact Output With BY Statement
Compact Output With Patient ID Option
Compact Output With Patient ID Option
General ECGTEST Compact Report










Decide what variables to always exclude.
Decide constant variables for title.
Decide what 1 to 1 variables to include.
Count and remove applicable leading zeroes.
Determine which data condensing tricks work.
If not condensable, they must be output in full
Predetermine variables that need FLOW.
Determine which variables have a fixed width.
Calculate width of all remaining variables.
Use BY variables and the ID option for date.
Compact Output With Visit ID Option
Compact Output With Visit ID Option
Original AE PROC REPORT Code
Changes Made to Produce Final Listing

Width of all variables was minimized.
Leading zeroes were stripped from subject; it was
concatenated with age sex and FLOW added.
Century was eliminated and date was output after
subject and made an ORDER variable.
Body system was coded into footnotes.
Verbatim label indentation was corrected.

Labels were beautified. “STUDY DAY” was centered.




Width of indented column was maximized to
eliminate FLOW of the concatenated variables.
 DAI was put in data set and FLOW added.
The above reduced the output from 21 to 11 pages.

Final AE PROC REPORT Code
Creating an Automated AE Listing
Find variables needed for other projects and:
 Find if their width is variable and calculate it.
 Make attractive labels with SPLIT characters.
 Exclude them if they are blank (e.g., time).
 Use minimum possible width.
 2 lines per observation: Thus use FLOW?
 Maximize width of the verbatim variable.
 User should specify variables and their order.
 Change footnotes to an automated legend.
 Add options for the BY variables.

Download