Programmability in SPSS 15 The Revolution Continues Jon Peck Technical Advisor SPSS Copyright (c) SPSS Inc, 2006 Agenda Recap of SPSS 14 Python programmability Developer Central New features in SPSS 15 programmability Writing first-class procedures Updating the data The Bonus Pack modules Interacting with the user Q&A Conclusion Copyright (c) SPSS Inc, 2006 Quotations from SPSS Users "Because of programmability, SPSS 14 is the most important release since I started using SPSS fifteen years ago." "I think I am going to like using Python." "Python, here I come!" "I now think Python is an amazing language." "Python and SPSS 14 and later are, IMHO, GREAT!" "By the way, Python is a great addition to SPSS." Copyright (c) SPSS Inc, 2006 The Combination of SPSS and Python SPSS provides a powerful engine for statistical and graphical methods and for data management. Python® provides a powerful, elegant, and easyto-learn language for controlling and responding to this engine. Together they provide a comprehensive system for serious applications of analytical methods to data. Copyright (c) SPSS Inc, 2006 Programmability Features in SPSS 14 and 15 SPSS 14.0 provided Programmability Multiple datasets Variable and File Attributes Programmability read-access to case data Ability to control SPSS from a Python program SPSS 15 adds Read and write case data Create new variables directly rather than generating syntax Create pivot tables and text blocks via backend API’s Easier setup Copyright (c) SPSS Inc, 2006 Programmability Advantages Makes possible jobs that respond to datasets, output, environment Allows greater generality, more automation Makes jobs more robust Allows extending the capabilities of SPSS Enables better organized and more maintainable code Facilitates staff specialization Increases productivity More fun Copyright (c) SPSS Inc, 2006 Programmability Overview Python extends SPSS via Runs in "back-end" syntax context (like macro) SaxBasic scripting runs in "front-end" context Two modes General programming language Access to variable dictionary, case data, and output Access to standard and third-party modules SPSS Developer Central modules Module structure for building libraries of code Traditional SPSS syntax window Drive SPSS from Python (external mode) Optional install Copyright (c) SPSS Inc, 2006 Legal Notice SPSS is not the owner or licensor of the Python software. Any user of Python must agree to the terms of the Python license agreement located on the Python web site. SPSS is not making any statement about the quality of the Python program. SPSS fully disclaims all liability associated with your use of the Python program. Copyright (c) SPSS Inc, 2006 The SPSS Programmability SDK Supports implementing various programming languages Requires a programmer to implement a new language VB.NET Plug-In available on Developer Central Works only in external mode Copyright (c) SPSS Inc, 2006 How Programmability Works Python interpreter embedded within SPSS SPSS runs in traditional way until BEGIN PROGRAM command is found Python collects commands until END PROGRAM command is found; then runs the program Python can communicate with SPSS through API's (calls to functions) Includes running SPSS syntax inside Python program Includes creating macro values for later use in syntax Python can access SPSS output and data OMS is a key tool Copyright (c) SPSS Inc, 2006 Example: Summarize Categorical Variables BEGIN PROGRAM. import spss, spssaux spssaux.GetSPSSInstallDir("SPSSDIR") spssaux.OpenDataFile("SPSSDIR/employee data.sav") # find categorical variables catVars = spssaux.VariableDict(variableLevel=['nominal', 'ordinal']) if catVars: spss.Submit("FREQ " + " ".join(catVars.variables)) # create a macro listing categorical variables spss.SetMacroValue("!catVars", " ".join(catVars.variables)) END PROGRAM. DESC !catVars. Run Copyright (c) SPSS Inc, 2006 Programmability Inside or Outside SPSS Two modes of operation SPSS Drives mode (inside): traditional syntax context BEGIN PROGRAM …program… END PROGRAM X Drives mode (outside): eXternal program drives SPSS Python interpreter (or VB.NET) import spss No SPSS Viewer, Data Editor, or SPSS user interface Output sent as text to the application – can be suppressed Has performance advantages Build programs with an IDE Even if to be run in traditional mode Copyright (c) SPSS Inc, 2006 PythonWin IDE Controlling SPSS Copyright (c) SPSS Inc, 2006 Python Resources Python.org Python Tutorial Global (standard) Module Index Python help system and help command Cheeseshop 1627 packages as of Sept 21, 2006 SPSS Developer Central SPSS Programming and Data Management, 3rd ed, 2006. Many books Look for books at the Python 2.4 level Copyright (c) SPSS Inc, 2006 Python Books Dive Into Python book or PDF Practical Python by Magnus Lie Hetland Extensive examples and discussion of Python Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher Second edition (July, 2006) of Martelli, Python in a Nutshell, O'Reilly Very clear, comprehensive reference material wxPython in Action by Rappin and Dunn Explains user interface building with wxPython Copyright (c) SPSS Inc, 2006 Cheeseshop: scipy scipy 0.5.0 Scientific Algorithms Library for Python scipy is an open source library of scientific tools for Python. scipy gathers a variety of high level science and engineering modules together as a single package. scipy provides modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, genetic algorithms, ODE solvers, special functions, and more. scipy requires and supplements NumPy, which provides a multidimensional array object and other basic functionality. scipy rework currently beta Visit Scipy.org Copyright (c) SPSS Inc, 2006 SPSS Developer Central New Web home for developing SPSS applications SPSS Developer Central old url: forums.spss.com/code_center Python Integration Plug-Ins Useful supplementary modules by SPSS and others Updated for SPSS 15 Articles on programmability and graphics Place to ask questions and exchange information Programmability Extension SDK Get Python itself from Python.org SPSS uses 2.4. (2.4.3) Not limited to programmability Went Live 21-May-2006 Key Supplementary Modules spssaux spssdata New for SPSS 15 trans extendedTransforms rake pls GPL graphics User-contributed code Copyright (c) SPSS Inc, 2006 Approaches to Creating New Procedures You can extend SPSS capabilities by building new procedures Combine SPSS procedures and transformations with Python logic Poisson regression (SPSS 14) example using iterated CNLR New raking procedure built over GENLOG Calculate data aggregates in SPSS and pass to algorithm coded in Python Or use ones that others have built Raking procedure starts with AGGREGATE Acquire case data and compute in Python Use Python standard modules and third-party additions Partial Least Squares Regression (pls module) Copyright (c) SPSS Inc, 2006 Adapt Existing Code Libraries Common to adapt existing libraries or code for use as Python extension modules Extension modules are normal Python modules C, C++, VB, Fortran,... Python itself written in C Many standard modules are C code Python tools and API's to assist Chap 25 in Python in a Nutshell Tutorial on extending and embedding the Python interpreter Copyright (c) SPSS Inc, 2006 Partial Least Squares Regression Regression with large number of predictors (even k > N) Similar to Principal Components but considers dependent variable simultaneously Calculates principal components of (y, X) then use regression on the scores instead of original data User chooses number of factors Equivalent to ordinary regression when number of factors equals number of predictors and one y variable For more information see An Optimization Perspective on Kernel Partial Least Squares Regression.pdf. Copyright (c) SPSS Inc, 2006 The pls Module Strategy Fetches data from SPSS Uses scipy matrix operations to compute results Writes pivot tables to SPSS Viewer Third-party module from Cheeseshop Subject to OMS SPSS 14 viewer module created pivot table using OLE automation Saves predicted values to active dataset Copyright (c) SPSS Inc, 2006 pls Example: REGRESSION vs PLS GET FILE="c:/spss15/tutorial/sample_files/car_sales.sav". REGRESSION /STATISTICS COEFF R /DEPENDENT sales /METHOD=ENTER curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width . begin program. import spss, pls pls.plsproc("sales", """curb_wgt engine_s fuel_cap horsepow length mpg price resale type wheelbas width""", yhat="predsales") end program. plsproc defaults to five factors Copyright (c) SPSS Inc, 2006 Results PLS with 5 factors almost equals regression with 11 variables Copyright (c) SPSS Inc, 2006 Raking Sample Weights "Raking" adjusts sample weights to control totals in n dimensions Example: data classified by age and sex with known population totals or proportions Calculated by fitting a main effects loglinear model Various adjustments required Not a complete solution to reweighting Not directly available in SPSS Copyright (c) SPSS Inc, 2006 Raking Module Strategy: combine SPSS procedures with Python logic rake.py (part of SPSS 15 Bonus Pack) Aggregates data via AGGREGATE to new dataset Creates new variable with control totals Applies GENLOG, saving predicted counts Adjusts predicted counts Matches back into original dataset Does not use MATCH FILES or require a SORT command Written in one (long) day rake.rake("age sex", [{0: 1140, 1:1140}, {0: 104.6, 1:2175.4}], finalweight="finalwt") Copyright (c) SPSS Inc, 2006 Extending SPSS Transformations SPSS 14 programmability can wrap SPSS syntax in Python logic SPSS 15 programmability can generate new variables directly Cursor can have accessType='a' SPSS 15 programmability can create new datasets from scratch Cursor can have accessType='w' SPSS 15 programmability can add cases directly Useful when definitions can be expressed in SPSS syntax Cursor can have accessType='n' spssdata module on Developer Central updated to support these modes Copyright (c) SPSS Inc, 2006 trans and extendedTransforms Modules trans module facilitates plugging in Python code to iterate over cases Runs as an SPSS procedure Passes the data Adds variables to the SPSS variable dictionary Can apply any calculation casewise Use with Standard Python functions (e.g., math module) Any user-written functions or appropriate classes Functions in extendedTransforms module Copyright (c) SPSS Inc, 2006 trans and extendedTransforms Modules trans strategy Pass case data through Python code writing result back to SPSS in new variables extendedTransforms collection of ten functions to apply to SPSS variables Regular expression search/replace Template-based substitution soundex and nysiis functions for phonetic equivalence Levenshtein distance function for string similarity Date/time conversions based on patterns Copyright (c) SPSS Inc, 2006 Python Regular Expressions Pattern matching in text strings If you use SPSS index or replace, you need these Standardize string data (Mr, Mr., Herr, Senor,...) Patterns can be simple strings (as with SPSS index) or complex patterns Pick out variable names with common parts Copyright (c) SPSS Inc, 2006 Regular Expressions: A Few Examples "age" – string containing the letters age "\wage" – string containing the word age "abc|xyz|pqrst" = string containing any of abc etc "\d+" – a string of any number of digits "x.*y" – a string starting with x and ending with y Can be case sensitive or not Can greatly simplify code currently using SPSS index and replace functions Copyright (c) SPSS Inc, 2006 Using trans and extendedTransforms search Function import spss, trans, spssaux, extendedTransforms spssaux.OpenDataFile("c:/data/names.sav") tproc = trans.Tfunction(listwiseDeletion=True) tproc.append(extendedTransforms.search, 'match','a8', ['names', trans.const('Peck|Pech|Pek')]) tproc.append(extendedTransforms.search, 'matchignorecase','a8', ['names', trans.const('peck'), trans.const(True)]) tproc.append(extendedTransforms.search, ('match2','startpos','length'), ('a12','f4.0','f4.0'), ['names', trans.const('Peck')]) tproc.execute() spss.Submit("SELECT IF length > 0") spssaux.SaveDataFile("c:/temp/namesplus.sav") Run Copyright (c) SPSS Inc, 2006 Using trans: Writing Your Own Function begin program. import trans, re def splitAndExtract(s): """split a string on "--" and return the left part and the number in the right part. Ex: "simvastatin-- PO 80mg TAB" -> "simvastatin", 80""" parts = s.split("--") try: number = re.search("\d+", parts[1]).group() except: number = None return parts[0], number tproc = trans.Tfunction() tproc.append(splitAndExtract, ("name", "number"), ("a30", "f5.0"), ["medicine"]) tproc.execute() end program. Run Copyright (c) SPSS Inc, 2006 extendedTransforms soundex and nysiis Algorithms for approximating phonetic equivalence of names soundexallwords can be used on unstructured text Applied to database of 20,000+ surnames import spss, trans, spssaux, extendedTransforms spssaux.OpenDataFile("c:/data/names.sav") tproc = trans.Tfunction() tproc.append(extendedTransforms.soundex, 'soundex','a5', ['names']) tproc.append(extendedTransforms.nysiis, 'nysiis', 'a20', ['names']) tproc.execute() spssaux.SaveDataFile("c:/temp/namesplusplus.sav") Run Copyright (c) SPSS Inc, 2006 Results Copyright (c) SPSS Inc, 2006 soundex on Unstructured Text (Overly) simple processing of unstructured text Use soundex word by word to abstract spelling No stemming, linguistic analysis etc Use STAFS for serious work Very simple to use begin program. import spss, trans, extendedTransforms t = trans.Tfunction() t.append(extendedTransforms.soundexallwords, 'allsoundexn66', 'a108', ['n_66']) t.execute() end program. Copyright (c) SPSS Inc, 2006 soundex on Unstructured Text Copyright (c) SPSS Inc, 2006 Creating a Graphical User Interface Python comes with Tkinter, a gui toolkit There are better ones freely downloadable E.g., wxPython Visit wxpython.org Very easy to do small user interactions Examples Message box File chooser Variable picker Copyright (c) SPSS Inc, 2006 Simple Message Box Using wxPython Copyright (c) SPSS Inc, 2006 Simple File Chooser Using wxPython Copyright (c) SPSS Inc, 2006 Variable Picker Using wxPython Copyright (c) SPSS Inc, 2006 Other New spss Module API’s User-missing values Pivot table API's BasePivotTable CellText Dimension Output Text block support GetVarMissingValues GetSPSSLowHigh Good for writing comments to the Viewer Miscellaneous GetWeightVar HasCursor SplitChange Copyright (c) SPSS Inc, 2006 Recap SPSS 14 introduced major programmability features SPSS 15 adds Reading and writing case data: new variables; new cases Creating pivot tables and text blocks Writing first-class SPSS procedures Bonus Pack and Partial Least Squares modules illustrate these features Developer Central improves ability to provide modules and information Will soon have four new SPSS 15 modules Copyright (c) SPSS Inc, 2006 Questions ? ? ? ? Copyright (c) SPSS Inc, 2006 SPSS 15: The Revolution Continues SPSS 15 programmability makes it easy to add capabilities beyond what is already built in to SPSS SPSS 15 makes it easier to build complete applications on top of SPSS SPSS 15 programmability makes you more productive SPSS 15 has lots of other great features, too Try it out Copyright (c) SPSS Inc, 2006 Write to Me! Copyright (c) SPSS Inc, 2006