What's New with Programmability and Scripting Jon Peck

advertisement
What's New with
Programmability and
Scripting
Jon Peck
Technical Advisor and Principal Software Engineer
Las Vegas, November 2008
Copyright (c) SPSS Inc, 2008
Do you have...
• Repetitive SPSS tasks?
• A need to reduce manual work?
• Problems you can't solve with traditional
syntax?
• A need for statistical procedures not in SPSS
Statistics?
• A need for higher productivity?
• A need to reduce code maintenance time?
See if programmability can help...
Copyright (c) SPSS Inc, 2008
Agenda
• Programmability introduction
• Four examples
– Automating repetitive work:
applySyntaxToFiles
– Integrating programs and scripting:
SPSSINC MODIFY TABLES
– Adding R statistical procedures:
SPSSINC QUANTILE REGRESSION
– Making your own user interface with .NET
The Statistical Explorer
Copyright (c) SPSS Inc, 2008
SPSS Statistics embeds three
programming languages
• Traditional syntax and Basic scripting still
present
• Plug-ins let you extend capabilities using
– Python
–R
– .NET languages (Windows only)
• Free plug-in downloads or get from cd
• SPSS Developer Central web site provides
articles, SPSS-written modules, plug-ins and
user contributions
Copyright (c) SPSS Inc, 2008
My first program
BEGIN PROGRAM PYTHON.
import spss
print "Hello, Las Vegas"
END PROGRAM
• Python or R program code goes in the
normal SPSS Statistics syntax window
Copyright (c) SPSS Inc, 2008
Programmability combines SPSS
Statistics with Python or R
• A program in the input stream can communicate with
SPSS Statistics and control it and use Python or R
facilities and modules (internal mode)
spss.Submit("GET FILE='c:/data/cars.sav'.")
• A Python or .NET application can embed SPSS
Statistics inside itself (external mode)
– User interface does not appear
• There is a lower level C API available in an SDK
Copyright (c) SPSS Inc, 2008
Python and R are open source
software
• Programmability plug-ins are an optional
install
– They are free
– They make possible tapping the work of the
Python and R communities
– Python and R have license agreements
– SPSS has a Freeware license agreement
•
SPSS is not the owner or licensor of the Python or R software. Any user
of Python or R must agree to the terms of the license agreement located
on the Python or R web site. SPSS is not making any statement about
the quality of the Python or R programs. SPSS fully disclaims all liability
associated with your use of the Python or R programs.
Copyright (c) SPSS Inc, 2008
Programmability increases your
power, flexibility, and productivity
• Generalization
– React flexibly to metadata, results, and the environment
• Automation
– Embed program logic in jobs
• Extension
– Add or extend procedures and transformations
– Tap existing R or Python statistical modules
• Integration
– Connect SPSS inputs and outputs to other agents
• And programmability is more fun
Copyright (c) SPSS Inc, 2008
Version 17 has new programmability
and scripting features
V 15
V 16 Additions
V 17 Additions
• Processor
• Processor
• Processor
– SPSS syntax
– Python
programs
– .NET
programs
• Front End
– SaxBasic
scripts
– COM support
– R programs
– Extension
commands
– Dataset class
• Front End
– Basic script
upgrade
– Python
scripts
– Extension
improvements
– Program and
script integration
– R graphics in
Viewer
• Front End
– Custom Dialog
Builder
– New syntax
editor
Copyright (c) SPSS Inc, 2008
Programmability functionality is
fully integrated into SPSS Statistics
• Programs run in the regular syntax stream
• Users can define SPSS Statistics syntax for
program and scripts via Extension
mechanism.
• Users can create dialog boxes and menus
using the Custom Dialog Builder.
– Not just for extensions or programs
• Python and R output appears in the Viewer
– plain text
– pivot tables
– charts
Copyright (c) SPSS Inc, 2008
There are new extension commands
available from Developer Central
• New for version 17
–
–
–
–
–
–
–
–
–
–
–
–
SETSMACRO: syntax for using Variable Sets
SPSSINC APRIORI: association rules (R)
SPSSINC HETCOR: polychoric and serial correlation (R)
SPSSINC MERGE TABLES: merge pivot tables
SPSSINC MODIFY OUTPUT: outline titling and styling
SPSSINC MODIFY TABLES: pivot table styling
SPSSINC QUANTREG: quantile regression (R)
SPSSINC RAKE: adjust weights to control totals
SPSSINC RANFOR/RANPRED: random forests (R)
SPSSINC RASCH: Rasch models (R)
SPSSINC ROBUST REGR: robust regression (R)
SPSSINC TOBIT REGR: Tobit regression (R)
• For version 16 (and 17)
– Seven others
Copyright (c) SPSS Inc, 2008
You can create and share your own
additions to SPSS Statistics
– Write Python or R functions to implement the functionality
• Use input API's to get data to Python or R
• Use output API's to create pivot tables
– For extensions,
• Define the syntax in an xml file
• Use tools in extension.py to receive parsed output and pass to
implementing function
– R extensions can be wrapped in Python code
• Use organization or author name as first word of command
– Use the Custom Dialog Builder to create the interface
• The CDB is not just for extensions
– Test!
– Package and distribute
– Contributions to Developer Central are welcome
• Documentation available at SPSS Developer Central
Copyright (c) SPSS Inc, 2008
Expand the audience by creating SPSS
Statistics syntax and dialog boxes
Copyright (c) SPSS Inc, 2008
Example I
Generalize and automate work
Copyright (c) SPSS Inc, 2008
applySyntaxToFiles
• You have syntax files and need to process
datasets not known in advance every day
• applySyntaxToFiles function applies a syntax file
to each file in input specification
• Optionally saves new sav and output files
• Optionally creates a log
• Can be run in internal or external mode
• Can be integrated into a larger process
Copyright (c) SPSS Inc, 2008
Automating a routine process
• Apply standard processing to an unknown set of files
• Produce processed data and reports
Copyright (c) SPSS Inc, 2008
Use a program to drive processing
begin program.
import spss, spssaux3
spssaux3.applySyntaxToFiles(inputspec="c:/temp/parts/*.sav",
syntax = "c:/userconf2008lasvegas/dailychecks.sps",
outputdatadir = "c:/temp/processed",
outputfiledir = "c:/temp/processed",
logfile ="c:/temp/processed/report.txt")
end program.
• dailychecks.sps could apply data cleaning
rules, modify data, and create reports
• This program could be run daily through
Production Mode or PES job scheduler or used
interactively
Copyright (c) SPSS Inc, 2008
Example II
Use integrated scripting for better table
presentation
Copyright (c) SPSS Inc, 2008
SPSSINC MODIFY TABLES extension
command
• TableLooks provide static formatting for entire areas of
a table
– data cells
– row and column layers
• You want tables with formatting beyond tableLooks
• Many users copy tables to Excel and manually format
them 
• Basic and Python Scripting provide programmatic way
to do formatting
• SPSSINC MODIFY TABLES provides syntax for
extensive formatting
– Eliminates need to know scripting
– Uses Extension mechanism for programs and Python
scripting
Copyright (c) SPSS Inc, 2008
Use dynamic highlighting to make
crosstab table easier to read
SPSSINC MODIFY TABLES SUBTYPE='Crosstabulation'
DIMENSION=ROWS SELECT='Std. Residual'
/STYLES TEXTSTYLE=BOLD BACKGROUNDCOLOR=255 0 0
APPLYTO='abs(x) >2'.
Copyright (c) SPSS Inc, 2008
Custom dialog boxes are easy to
create
• Dialog created with
Custom Dialog
Builder
• Generates
extension command
syntax
• Easy to distribute
Copyright (c) SPSS Inc, 2008
Use static formatting to call out
parts of a table
SPSSINC MODIFY TABLES subtype='variables in the equation'
SELECT="B" "Sig."
/STYLES TEXTCOLOR = 0 0 255
BACKGROUNDCOLOR=0 255 0.
Copyright (c) SPSS Inc, 2008
Format CTABLES totals to call them
out
SPSSINC MODIFY TABLES SUBTYPE="Custom Table"
SELECT = "Total" DIMENSION=ROWS
/STYLES BACKGROUNDCOLOR=255 255 88
TEXTSTYLE = BOLD
Copyright (c) SPSS Inc, 2008
Use custom functions for special
effects
SPSSINC MODIFY TABLES SUBTYPE='Report' SELECT="<<ALL>>"
/STYLES APPLYTO=DATACELLS TEXTCOLOR=255 255 255
TEXTSTYLE=BOLD
CUSTOMFUNCTION="customstylefunctions.washColumnsBlue".
def washColumnsBlue(obj, i, j, numrows, numcols, section, more):
mincolor=150.
maxcolor=255.
increment = (maxcolor - mincolor)/(numcols-1)
colorvalue = round(mincolor + increment * j)
obj.SetBackgroundColorAt(i,j, RGB((mincolor, mincolor, colorvalue)))
Copyright (c) SPSS Inc, 2008
It is possible to get carried away with this
Copyright (c) SPSS Inc, 2008
Example III
Add R procedures seamlessly to SPSS
Statistics
Copyright (c) SPSS Inc, 2008
R procedures can be accessed from
SPSS Statistics using the R plug-in
• R
– has many advanced statistical procedures
– is a complex language for programming statistics
– is difficult to learn
• The R plug-in makes it easy to use R packages
– SPSS Statistics datasets and Viewer output can be
processed by R using plug-in
– Graphical, text, and table output appear in the Viewer
– New SPSS datasets can be created from R
– R can communicate with SPSS via 30 API's
– Integration requires writing some R wrapper code
• Plug-in is downloadable from Developer Central
Copyright (c) SPSS Inc, 2008
Quantile regression models
conditional quantiles
• Ordinary regression models conditional mean
• Median regression is 50th quantile
• Estimating quantiles is useful with varying
spread, asymmetries, outliers
• Areas of application include
– empirical finance
• value at risk
• mutual fund investment styles
• credit scoring
– school quality
– demand analysis
– others
Copyright (c) SPSS Inc, 2008
SPSSINC QUANTILE REGRESSION
extension embeds R quantreg
Copyright (c) SPSS Inc, 2008
Plots and pivot tables appear in the
Viewer
Copyright (c) SPSS Inc, 2008
New datasets appear in Data Editor
windows
Copyright (c) SPSS Inc, 2008
Example IV
The .NET plug-in allows an easy .NET
way to do your own user interface for
SPSS Statistics
Copyright (c) SPSS Inc, 2008
The Statistical Explorer uses the
.NET plug-in
Copyright (c) SPSS Inc, 2008
The Statistical Explorer
Copyright (c) SPSS Inc, 2008
Statistics and chart shown depends
on variable measurement level
Copyright (c) SPSS Inc, 2008
Where we have been today
• SPSS Statistics 17 completes
programmability and scripting building blocks
• Unification of programs and scripts
• Custom Dialog Builder
• Extension improvements and new extensions
• R integration and graphics in Viewer
• SPSS Developer Central is your friend
Copyright (c) SPSS Inc, 2008
Questions
?
?
Copyright (c) SPSS Inc, 2008
Programmability increases your
power, flexibility, and productivity
• Generalization and automation
– applySyntaxToFiles
– SPSS MODIFY TABLES
• Extension
– SPSSINC MODIFY TABLES using integrated scripting
– SPSS QUANTREG using R
– Many new extension commands available
• Integration
– applySyntaxToFiles as part of a process
• And it's still more fun
Copyright (c) SPSS Inc, 2008
Contact
Copyright (c) SPSS Inc, 2008
Download