How to Create an XML Map with the XML Mapper

advertisement
NESUG 2010
Foundations and Fundamentals
How to Create an XML Map with the XML Mapper
Wendi L. Wright
CTB McGraw-Hill
ABSTRACT
You’ve been given an XML file and told to read it into SAS. You open this file and think to yourself “This
looks like a huge amount of work”. However, using the XML Mapper to create an XML map can take only
a few minutes and then SAS will treat the XML file as if it is a SAS dataset.
This presentation will take a look at an XML file and then will show how to create an XML Map for it using
the XML Mapper. The XML Mapper is a really nice tool that makes it extremely easy to read an XML file
into SAS. It even provides the SAS code you need to use this map to read in your XML file.
INTRODUCTION
XML is a hierarchical data structure similar to HTML, but just a bit stricter in the use of the start and end
tags. XML requires there to ALWAYS be an end tag when there is a start tag. See below for a short
example of an XML file:
Figure 1
<?xml version="1.0" standalone="yes"?>
<MQDATA>
<MQT1.ESM.T01.GR.RESOLVE>
<GRResolveRequest ReqID="00001" ReqEndpoint="NONE"><ReqGrpLst><ReqGrpDet><ItemLst>
<ItemDet DocComCD="1234567" TstID="TestName" TstFmCD="D" TstLvlCD="25" ItemNum="025"
DelElementID="1234567" Rsp=" 2 "></ItemDet>
<ItemDet DocComCD="1234567" TstID="TestName" TstFmCD="D" TstLvlCD="25" ItemNum="025"
DelElementID="1234567" Rsp=" 5 "></ItemDet>
<ItemDet DocComCD="1234567" TstID="TestName" TstFmCD="D" TstLvlCD="25" ItemNum="025"
DelElementID="1234567" Rsp="6 "></ItemDet>
</ItemLst></ReqGrpDet></ReqGrpLst></GRResolveRequest>
</MQT1.ESM.T01.GR.RESOLVE>
</MQDATA>
You can see that for each start tag, there is also an end tag. The start tags are characterized with a
<name> and the end tags have a slash in front of the name </name>. There is also an overall start tag
and end tag that define the entire file. The only line that does not have an end tag is the first line which
defines which version of XML is being used.
INSTALLING THE XML MAPPER
The link to use to get the XML Mapper software (freeware) is:
http://www.sas.com/apps/demosdownloads/92_SDL_sysdep.jsp?packageID=000513
Once you have downloaded the software, execute it to install it on your system.
1
NESUG 2010
Foundations and Fundamentals
USING THE XML MAPPER
When you first open the XML Mapper, you see a screen with three main parts to it. The upper left will
show the mapper’s interpretation of your XML map when you ask SAS to open the XML file. The upper
right shows how you want the SAS datasets created (what datasets and what variables to pull into each).
The bottom shows various source and output tabs that are useful references in the process of designing
your SAS datasets.
Top Left:
Top Right:
Bottom:
LOADING THE XML FILE INTO THE XML MAPPER
At the top are the standard Windows tabs – one of which is File. Choose File-“Open XML” and select
your XML file.
This adds text to the upper left and the bottom of the file. The bottom shows an exact duplicate of Figure
1 above.
2
NESUG 2010
Foundations and Fundamentals
However, the top left now shows the mapper’s interpretation of the file. You will need to expand it by
clicking on all the little X’s at the left of each entry to see the full interpretation. Here is the interpretation
of the file above:
You can see that the Mapper displays this file in a hierarchical fashion. These elements can now be
easily interpreted into SAS datasets and variables within the datasets.
CREATING AN XML MAP
Now we move to the upper right. The first thing we need to define is our SAS dataset(s) that we want our
data to be stored in. The simplest way to do this is to have the XML Mapper do it for you! All you have to
do is click on the Tools tab in the menu and then pick “AutoMap using XML”.
3
NESUG 2010
Foundations and Fundamentals
The XML Mapper will add a lot of entries into the top right of the screen after selecting this option.
Each entry above will be a SAS dataset. If you click on the little plusses at the left of each, you will see
what variables each dataset will hold. The name of the XML Map is currently ‘AUTO_GEN’. We can
change this name to something else by modifying the ‘Name’ field in the ‘Properties’ tab at the top of the
screen. Let’s call it MyFirstMap and let’s also add a description to this map. You will notice that as you
modify the name at the top, the name in the flowchart also changes:
You can also create your own SAS datasets by clicking and dragging entries (both datasets and
variables) from the top left pane to the top right pane. When you do this, you can create customized SAS
datasets. In some cases, you can create a single dataset holding all the information from all the levels of
the XML file. Note this will also require the use of the ‘Retain’ option on the ‘Properties’ tab of the higher
level variables. See example of this later in the paper.
4
NESUG 2010
Foundations and Fundamentals
WORKING WITH THE XML MAP DATASETS AND VARIABLES
When you start expanding the dataset entries in the top right window, you will see that the mapper
automatically populated each dataset with the variables that appeared in the entries in the top left. Pay
particular attention to the entries with the little red triangles next to them. These are the dataset variables.
Note that if you do not want any of these variables included in the final SAS dataset, you can right click on
the variable and choose to delete. You can also change the order of the variables in the dataset by rightclicking then choosing to: 1) Move First, 2) Move Up, 3) Move Down, or 4) Move Last.
Each of the dataset variables can be modified using the tabs at the top. Using the ‘Properties’ tab, you
can modify the Name and Description. I would not recommend changing the path unless you are very
experienced with this. Using the variable DocComCD, let’s modify some of this:
5
NESUG 2010
Foundations and Fundamentals
Using the ‘Format’ tab, you can change the variables, type, and length, and add a format and informat.
Currently, the mapper lists this variable as character with length 32. We can leave it as is, or modify it as
needed. As usual, the choices for Type are Character or Numeric. Character types allow you to also
specify a length. This box is grayed out if you choose Numeric.
Datatypes for the variables can be chosen as well. ‘String’ is the default datatype for Character and
‘Integer’ is the default for Numeric.
The format and informats may be used as you would normally use them in setting up an INPUT or PUT
statements in regular SAS code.
6
NESUG 2010
Foundations and Fundamentals
USING THE BOTTOM PART OF THE SCREEN
During this whole process, the bottom part of the screen will often reflect what you are doing above. For
example, as you modify the variable names, the ‘Contents’ tab at the bottom reflects the new SAS
dataset values (similar to showing the output from Proc Contents and how it changes as you make
changes above).
The Table view shows the SAS dataset as it would be read in by the current XML Map.
If you want to take a ‘behind the scenes’ look at the XML map you are creating, then click on the
‘XMLMap’ tab. This will show the actual map code that is being constructed.
7
NESUG 2010
Foundations and Fundamentals
There are two tabs that will help to make sure you have a map that will work consistently and
appropriately. The validation tab and the Log tab. Both of these allow you to track how your map is
progressing and if you have anything in your map that may cause problems down the line.
SAVING THE XML MAP
Once you have your map completed, you can save the map by choosing File – ‘Save XMLMap As’.
Choose a location and click ‘OK’.
USING THE AUTOMATICALLY GENERATED SAS CODE TO READ THE XML FILE
One of the nicest things about the XML Mapper is that it provides the SAS code needed to read your XML
file using the XLM Mapper that you just created. If you look at the bottom part of the screen, click on the
tab that says “SAS Code Example”. This code will include the location and name of both the XML file you
imported and the location and name of the XML Map you just saved. There are also checkboxes that, if
checked, mean that SAS will include code to provide you with that information. If you wish to save this
SAS code, choose File – “Save SAS As” and choose a location for the program. The SAS code includes
several different PROCs using this XML code as a SAS dataset and shows you how to use the SAS
dataets.
8
NESUG 2010
Foundations and Fundamentals
Once you run you SAS program, you will then have the SAS datasets defined in your XML map. These
datasets are the same as any other SAS datasets and you can merge them, sort them, and run
procedures on the data in them.
CUSTOMIZING YOUR XML MAP
If you would rather create your own SAS dataset instead of using the Auto Create Feature, there are
couple additional steps. First create a new SAS dataset by right-clicking on the name of the map at the
top, then choosing to ‘Insert’. At this point, the SAS dataset will be empty (have no variables).
You can name this dataset by using the ‘Properties’ tab at the top right. To load variables into the
dataset, you can click and drag the variables you want from the left pane and dropping them on top of the
new dataset name. As you do this, watch what happens when you combine variables from several
different hierarchical levels on the left. I have found it is beneficial to be careful what order you bring the
variables in from. I start at the bottom most hierarchy and move up.
9
NESUG 2010
Foundations and Fundamentals
For the variables you are pulling that are coming from higher up in the hierarchy, you may need to click
on the ‘Retain’ box on the ‘Properties’ tab to make sure the correct values for the variables in the higher
hierarchies are kept for all the records they belong to. Customizing your XML map this way allows you to
create a single SAS dataset with all the data included that you need.
Keep track of any warnings and error messages using the ‘Validate’ tab in the bottom pane. Also, it is
recommended that you check the values in your SAS dataset against the values in the XML file for
several records to make sure you have your Table definition set up correctly. Both of these can be found
in the tabs at the bottom: the ‘XML Source’ tab and the ‘Table View’ tabs.
10
NESUG 2010
Foundations and Fundamentals
CONCLUSION
The XML Mapper is an extremely useful tool. It is one of the simplest ways to read XML files into SAS. It
is easy to use and allows many options to customize your datasets and variables. What I have described
above is only a beginning. I hope this will help those of you who are handed an XML file and asked to
read it into SAS on a moments notice.
REFERENCES
XML Help Documents on the Internet:
file:///C:/Program%20Files/SAS/SASXMLMapper/9.2/doc/index.html
AUTHOR CONTACT
Your comments and questions are valued and welcome. Contact the author at:
Wendi L. Wright
1351 Fishing Creek Valley Rd
Harrisburg, PA 17112
Phone: (717) 513-0027
E-mail: wendi_wright@ctb.com
SAS and all other SAS Institute Inc. products or service names are registered trademarks or trademarks
of SAS Institute, Inc. in the USA and other countries.  indicates USA registration.
Other brand and product names are trademarks of their respective companies.
11
Download