Install Open R on NPS v7.2.0.0 [Build 40805] Introductions: This document will guide you step by step to install Open R on your NPS machine, also install and configure R, in order to communicate with Netezza. For 3.0.2 all the necessary files are available in Netezza Developer Network at: https://www.ibm.com/developerworks/community/groups/service/html/communityview?com munityUuid=266888e9-4b4b-44cd-bd51e32d05da9143#fullpageWidgetId=Wf1f7a753939e_4e8b_b2f5_c349f2f91dbb&file=e3a27390404b-460a-8bbe-930f0836b743 I used for this guide the information found in the following documents: - Open R on Netezza-MKL.pdf - InstallingOpenR.pdf - Netezza_Analytics_Users_Guide.pdf - IBM Netezza Analytics Administrator’s Guide Server steps: 1. Download and Configure Netezza Emulator 2. Install same INZA on the Netezza Emulator and on the NPS machine 3. Install Open R on the Netezza Emulator 4. Install Compiled Open R code to NPS machine 5. Create INZA-enabled working database Note: Top 3 steps for the emulator can be skipped if a binary is already available from someone else. Client steps: 1. Download and install R for Windows 7 64bit 2. Download and install RStudio for Windows 7 64bit 3. Download, install and configure the Netezza ODBC driver 4. Download and install additional packages for R (using RStudio) 5. Test R on Netezza Server steps: 1. Download and Configure Netezza Emulator A. Run the Virtual Machines. First you run HOST, and after the OS is loaded, log on to the machine using the Netezza user: nz. In this step you can throw an nzstate command to check the status of the SPU. The initial state should be: Discovering Second, run the SPU. You will know when the SPU is loaded when the nzstate command, will return the status: Online 2. Install INZA on the Netezza Emulator Follow the steps documented in the IBM Netezza Analytics Administrator’s Guide 3. Install Open R on the Netezza Emulator A. Install required Linux libraries on HOST. In order to compile Open R, several Linux libraries should be installed on the HOST machine. This step is well described InstallingOpenR.pdf, but because of the version on the Netezza emulator I used, an additional package should be installed on HOST a. Create a directory mkdir /nz/export/ae/dev Then change the work path to dev folder. All the libraries should stay in this folder b. Obtain rpm zip files The additional library we have to install is gmp-4.1.4-12.3_2.el5.x86_64.rpm, which contains the libgmp.so.3 file, required for the compilation of R package, so first we will download and install this package wget ftp://ftp.pbone.net/mirror/atrpms.net/el5-x86_64/atrpms/testing/gmp4.1.4-12.3_2.el5.x86_64.rpm You need to log as root on the host machine in order to install this rpm package rpm –ivh gmp-4.1.4-12.3_2.el5.x86_64.rpm Then copy the libgmp.so.3 from HOST to SPU scp /usr/lib64/libgmp.so.3 root@spu0101:/usr/lib64 scp /usr/lib64/libgmp.so.3 root@spu0101:/nz/export/ae/sysroot/spu64/lib (in the past needed to wget a number of files but this is included in installOpenSourceR.sh since 3.0.1) c. Copy the script installOpenSourceR.sh in the /nz/export/ae/dev folder and set the permission to execute chmod 755 installOpenSourceR.sh d. Start the installation ./installOpenSourceR.sh -p <spu-root-password> e. Depending by the configuration of RAM and CPU configured for you virtual machines, the process will complete around one hour. For my configuration, presented above on my Mac, the compilation finished in almost 17 minutes. Note: You can uninstall this version of R, using the command ./installOpenSourceR.sh –u You can play with different versions of R in order to be compiled and installed. 4. Install Compiled Open R code to NPS machine-instructions for this step are in InstallingOpenR.pdf A. Create the following directory on the target machine. The directory name must be identical to the directory name on the source Machine /nz/extensions/nz/r_install B. From the /nz/extensions/nz/r_install directory on the source machine, copy the following files to the newly created directory on the target machine: a. sysroot_overlay.tar.gz b. r_<version>_overlay.tar.gz C. Copy also this script installOpenSourceR.sh to this directory. D. Call the installOpenSourceR.sh script by entering the following command. SPU password is not required in this case: ./installOpenSourceR.sh -i –b 5. Create INZA-enabled working database Create a working database on you NPS machine, which will store all the result tables, after performing analytics with the Netezza client packages for R. Do not use system databases as SYSTEM, NZM, NZA, NZR, NZMSG, and NZRC to store this kind of data. The database you will create must be INZA-enabled. For this example we will create the database ANALYSIS_DB; and the owner of this database will be DEVUSER A. Log to your NPS machine, launch nzsql, then execute the following commands: CREATE USER DEVUSER WITH PASSWORD '<password>'; ALTER USER DEVUSER WITH IN GROUP inza_admins; CREATE DATABASE ANALYSIS_DB; ALTER DATABASE ANALYSIS_DB OWNER TO DEVUSER; \c ANALYSIS_DB GRANT ALL ADMIN TO DEVUSER; To INZA-enable your database you need to have the right to run the script create_inza_db.sh on your NPS machine. This script is found in the directory /nz/export/ae/utilities/bin To INZA-enable the ANALYSIS_DB run the command ./create_inza_db.sh ANALYSIS_DB To enable rights for the DEVUSER run the command ./create_inza_db_developer.sh ANALYSIS_DB DEVUSER Note: The INZA_DEVELOPERS group is for users who need to register new AE’s, UDX’s and stored procedures CLIENT STEPS 1. Download and install R for Windows 7 64bit Used Windows 7 Professional 64bit with the latest updates. For Windows R, used the same R version as on the netezza server-3.0.2 Download R (http://cran.r-project.org/bin/windows/base/old/3.0.2/R-3.0.2-win.exe), and install it. No special configuration is needed. 2. Download and install latest RStudio for Windows 7 64bit http://www.rstudio.com/products/rstudio/download/ RStudio IDE is a powerful and productive free GUI for R. 3. Download, install and configure the Netezza ODBC driver Before you start using R with IBM Netezza Analytics, you need to install the ODBC drivers for Windows 7. Directions are here: https://www304.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.datacon.doc/c_dat acon_installing_configuring_odbc_win.html Add the following information: NZSQL (by default) ptional). -enabled database - ANALYSIS_DB he database user – DEVUSER DEVUSER After you fill all the fields, click the Test Connection button 4. Download and install additional packages for R (using RStudio) In order to connect IBM Netezza Analytics with R language, three packages should be installed on R. nza – is the Netezza Analytics Library for R nzr – is the Netezza R library nzmatrix – is the Netezza Matrix Library for R Netezza R client packages 3.0.2 are included in zip file at top A. install prerequisite packages with RStudio (a) Open RStudio (b) Choose Tools -> Install Packages (c) In the Install Packages window, Choose to install from “Repository (CRAN, CRANextra)” and install the following packages: arules—Provides support for association rules arulesViz—Required for the visualization of association rules as provided in the nza package. Is not required for the installation of the Netezza Analytics packages nzr, nzmatrix, and nza. bitops—Provides functions for bitwise operations ca—Provides simple correspondence analysis, multiple correspondence analysis, and joint correspondence analysis caTools—Provides tools for moving window statistics, GIF, Base64, ROC AUC, and others e1071—Provides miscellaneous functions of the Department of Statistics (e1071) MASS—Provides support functions and Datasets for Venables and Ripley's MASS rgl—Provides a 3D visualization device system RODBC—Provides ODBC database access tree—Provides classification and regression trees rpart—Provides decision and regression trees tree—Provides classification and regression trees XML—Provides tools for parsing and generating XML within R B. After installing all the prerequisite packages, install the Netezza additional packages, in this order. First nrz, then nza (nzr is a prerequisite) then nzmatrix To install, In the Install Packages window, Choose to install from “Package Archive File” 5. Test R on Netezza In order to test the Netezza additional packages from R, several datasets should be downloaded and installed on the NPS machine. These sample datasets are free available via the Internet. Do the following steps: A. Log into the NPS machine as user nz B. Navigate to /nz/export/ae/utilities/bin C. Create a subdirectory testData D. Download the datasets on your client machine http://fimi.ua.ac.be/data/retail.dat.gz http://archive.ics.uci.edu/ml/databases/census-income/census.tar.gz E. F. G. H. I. http://archive.ics.uci.edu/ml/machine-learning-databases/winequality/winequality-white.csv http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/soybeanlarge.data http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/soybeanlarge.test http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris Move all the datasets to testData sub-directory (you can use an FTP client for this operation) Install datasets. Use the loadTestTable.sh script from /nz/export/ae/utilities/bin folder to create tables in the database and load the data from the downloaded datasets to the newly-created tables. Set permissions for devuser on databases Grant the following permissions to devuser running as admin on each of these database-nza, nzr, nzm GRANT ALL ON AGGREGATE, DATABASE, FUNCTION, GROUP, PROCEDURE,USER, TABLE, EXTERNAL TABLE, SYSTEM TABLE, SEQUENCE,SYSTEM VIEW, MANAGEMENT VIEW, MANAGEMENT TABLE, SYNONYM,VIEW TO devuser; Grant select permission on database ADMIN to devuser: GRANT SELECT ON AGGREGATE, DATABASE, FUNCTION, GROUP, PROCEDURE,USER, TABLE, EXTERNAL TABLE, SYSTEM TABLE, SEQUENCE,SYSTEM VIEW, MANAGEMENT VIEW, MANAGEMENT TABLE, SYNONYM,VIEW TO devuser; J. Test nzr package Open RStudio on your client and type library(nzr) Library nzr is loaded, also the prerequisite libraries Type demo (nzr) When is asking you, type Return The demo program will run doing some calculations. At the end of processing the connection is closed K. Test nza package Open RStudio on your client and type library(nza) Library nza is loaded, also the prerequisite libraries Type demo (nza) When is asking you, type Return In the Plots window of the RStudio, we get for this demo some nice graphs L. Test nzmatrix package Open RStudio on your client and type library(nzmatrix) Library nzmatrix is loaded, also the prerequisite libraries Type demo (nzmatrix) When is asking you, type Return For this demo we have a nice plot chart