Install Open R on NPS v7.2.0.0 [Build 40805] Introductions

advertisement
Install Open R on NPS v7.2.0.0 [Build 40805]
Introductions:
This document will guide you step by step to install Open R on your NPS machine, also install
and configure R, in order to communicate with Netezza. For 3.0.2 all the necessary files are
available in Netezza Developer Network at:
https://www.ibm.com/developerworks/community/groups/service/html/communityview?com
munityUuid=266888e9-4b4b-44cd-bd51e32d05da9143#fullpageWidgetId=Wf1f7a753939e_4e8b_b2f5_c349f2f91dbb&file=e3a27390404b-460a-8bbe-930f0836b743
I used for this guide the information found in the following documents:
- Open R on Netezza-MKL.pdf
- InstallingOpenR.pdf
- Netezza_Analytics_Users_Guide.pdf
- IBM Netezza Analytics Administrator’s Guide
Server steps:
1. Download and Configure Netezza Emulator
2. Install same INZA on the Netezza Emulator and on the NPS machine
3. Install Open R on the Netezza Emulator
4. Install Compiled Open R code to NPS machine
5. Create INZA-enabled working database
Note: Top 3 steps for the emulator can be skipped if a binary is already available from
someone else.
Client steps:
1. Download and install R for Windows 7 64bit
2. Download and install RStudio for Windows 7 64bit
3. Download, install and configure the Netezza ODBC driver
4. Download and install additional packages for R (using RStudio)
5. Test R on Netezza
Server steps:
1. Download and Configure Netezza Emulator
A. Run the Virtual Machines. First you run HOST, and after the OS is loaded, log on to
the machine using the Netezza user: nz. In this step you can throw an nzstate
command to check the status of the SPU. The initial state should be: Discovering
Second, run the SPU. You will know when the SPU is loaded when the nzstate
command, will return the status: Online
2. Install INZA on the Netezza Emulator
Follow the steps documented in the IBM Netezza Analytics Administrator’s Guide
3. Install Open R on the Netezza Emulator
A. Install required Linux libraries on HOST.
In order to compile Open R, several Linux libraries should be installed on the
HOST machine. This step is well described InstallingOpenR.pdf, but because of
the version on the Netezza emulator I used, an additional package should be
installed on HOST
a. Create a directory
mkdir /nz/export/ae/dev
Then change the work path to dev folder. All the libraries should stay in this
folder
b. Obtain rpm zip files
The additional library we have to install is gmp-4.1.4-12.3_2.el5.x86_64.rpm,
which contains the libgmp.so.3 file, required for the compilation of R package,
so first we will download and install this package
wget ftp://ftp.pbone.net/mirror/atrpms.net/el5-x86_64/atrpms/testing/gmp4.1.4-12.3_2.el5.x86_64.rpm
You need to log as root on the host machine in order to install this rpm package
rpm –ivh gmp-4.1.4-12.3_2.el5.x86_64.rpm
Then copy the libgmp.so.3 from HOST to SPU
scp /usr/lib64/libgmp.so.3 root@spu0101:/usr/lib64
scp /usr/lib64/libgmp.so.3 root@spu0101:/nz/export/ae/sysroot/spu64/lib
(in the past needed to wget a number of files but this is included in
installOpenSourceR.sh since 3.0.1)
c. Copy the script installOpenSourceR.sh in the /nz/export/ae/dev folder and
set the permission to execute chmod 755 installOpenSourceR.sh
d. Start the installation ./installOpenSourceR.sh -p <spu-root-password>
e. Depending by the configuration of RAM and CPU configured for you virtual
machines, the process will complete around one hour. For my configuration,
presented above on my Mac, the compilation finished in almost 17 minutes.
Note:
You can uninstall this version of R, using the command
./installOpenSourceR.sh –u
You can play with different versions of R in order to be compiled and installed.
4. Install Compiled Open R code to NPS machine-instructions for this step are in
InstallingOpenR.pdf
A. Create the following directory on the target machine. The directory name
must be identical to the directory name on the source
Machine
/nz/extensions/nz/r_install
B. From the /nz/extensions/nz/r_install directory on the source machine,
copy the following files to the newly created directory on the target
machine:
a. sysroot_overlay.tar.gz
b. r_<version>_overlay.tar.gz
C. Copy also this script installOpenSourceR.sh to this directory.
D. Call the installOpenSourceR.sh script by entering the following command.
SPU password is not required in this case: ./installOpenSourceR.sh -i –b
5. Create INZA-enabled working database
Create a working database on you NPS machine, which will store all the result tables,
after performing analytics with the Netezza client packages for R. Do not use system
databases as SYSTEM, NZM, NZA, NZR, NZMSG, and NZRC to store this kind of data. The
database you will create must be INZA-enabled. For this example we will create the
database ANALYSIS_DB; and the owner of this database will be DEVUSER
A. Log to your NPS machine, launch nzsql, then execute the following commands:
CREATE USER DEVUSER WITH PASSWORD '<password>';
ALTER USER DEVUSER WITH IN GROUP inza_admins;
CREATE DATABASE ANALYSIS_DB;
ALTER DATABASE ANALYSIS_DB OWNER TO DEVUSER;
\c ANALYSIS_DB
GRANT ALL ADMIN TO DEVUSER;
To INZA-enable your database you need to have the right to run the script
create_inza_db.sh on your NPS machine. This script is found in the directory
/nz/export/ae/utilities/bin
To INZA-enable the ANALYSIS_DB run the command
./create_inza_db.sh ANALYSIS_DB
To enable rights for the DEVUSER run the command
./create_inza_db_developer.sh ANALYSIS_DB DEVUSER
Note: The INZA_DEVELOPERS group is for users who need to register new AE’s,
UDX’s and stored procedures
CLIENT STEPS
1. Download and install R for Windows 7 64bit
Used Windows 7 Professional 64bit with the latest updates. For Windows R, used the
same R version as on the netezza server-3.0.2
Download R (http://cran.r-project.org/bin/windows/base/old/3.0.2/R-3.0.2-win.exe),
and install it. No special configuration is needed.
2. Download and install latest RStudio for Windows 7 64bit
http://www.rstudio.com/products/rstudio/download/
RStudio IDE is a powerful and productive free GUI for R.
3. Download, install and configure the Netezza ODBC driver
Before you start using R with IBM Netezza Analytics, you need to install the ODBC
drivers for Windows 7. Directions are here:
https://www304.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.datacon.doc/c_dat
acon_installing_configuring_odbc_win.html
Add the following information:
NZSQL (by default)
ptional).
-enabled database - ANALYSIS_DB
he database user – DEVUSER
DEVUSER
After you fill all the fields, click the Test Connection button
4. Download and install additional packages for R (using RStudio)
In order to connect IBM Netezza Analytics with R language, three packages should be
installed on R.
nza – is the Netezza Analytics Library for R
nzr – is the Netezza R library
nzmatrix – is the Netezza Matrix Library for R
Netezza R client packages 3.0.2 are included in zip file at top
A. install prerequisite packages with RStudio
(a) Open RStudio
(b) Choose Tools -> Install Packages
(c) In the Install Packages window, Choose to install from “Repository (CRAN,
CRANextra)” and install the following packages:
 arules—Provides support for association rules
 arulesViz—Required for the visualization of association rules as
provided in the nza package. Is not required for the installation
of the Netezza Analytics packages nzr, nzmatrix, and nza.
 bitops—Provides functions for bitwise operations
 ca—Provides simple correspondence analysis, multiple
correspondence analysis, and joint correspondence analysis
 caTools—Provides tools for moving window statistics, GIF,
Base64, ROC AUC, and others
 e1071—Provides miscellaneous functions of the Department of
Statistics (e1071)
 MASS—Provides support functions and Datasets for Venables
and Ripley's MASS
 rgl—Provides a 3D visualization device system
 RODBC—Provides ODBC database access
 tree—Provides classification and regression trees
 rpart—Provides decision and regression trees
 tree—Provides classification and regression trees
 XML—Provides tools for parsing and generating XML within R
B. After installing all the prerequisite packages, install the Netezza additional
packages, in this order. First nrz, then nza (nzr is a prerequisite) then nzmatrix
To install, In the Install Packages window, Choose to install from “Package
Archive File”
5. Test R on Netezza
In order to test the Netezza additional packages from R, several datasets should be
downloaded and installed on the NPS machine. These sample datasets are free available
via the Internet. Do the following steps:
A. Log into the NPS machine as user nz
B. Navigate to /nz/export/ae/utilities/bin
C. Create a subdirectory testData
D. Download the datasets on your client machine
http://fimi.ua.ac.be/data/retail.dat.gz
http://archive.ics.uci.edu/ml/databases/census-income/census.tar.gz
E.
F.
G.
H.
I.
http://archive.ics.uci.edu/ml/machine-learning-databases/winequality/winequality-white.csv
http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data
http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/soybeanlarge.data
http://archive.ics.uci.edu/ml/machine-learning-databases/soybean/soybeanlarge.test
http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris
Move all the datasets to testData sub-directory (you can use an FTP client for
this operation)
Install datasets. Use the loadTestTable.sh script from
/nz/export/ae/utilities/bin folder to create tables in the database and load the
data from the downloaded datasets to the newly-created tables.
Set permissions for devuser on databases
Grant the following permissions to devuser running as admin on each of these
database-nza, nzr, nzm
GRANT ALL ON AGGREGATE, DATABASE, FUNCTION, GROUP,
PROCEDURE,USER, TABLE, EXTERNAL TABLE, SYSTEM TABLE,
SEQUENCE,SYSTEM VIEW, MANAGEMENT VIEW, MANAGEMENT
TABLE, SYNONYM,VIEW TO devuser;
Grant select permission on database ADMIN to devuser:
GRANT SELECT ON AGGREGATE, DATABASE, FUNCTION, GROUP,
PROCEDURE,USER, TABLE, EXTERNAL TABLE, SYSTEM TABLE,
SEQUENCE,SYSTEM VIEW, MANAGEMENT VIEW, MANAGEMENT
TABLE, SYNONYM,VIEW TO devuser;
J. Test nzr package
Open RStudio on your client and type
library(nzr)
Library nzr is loaded, also the prerequisite libraries
Type
demo (nzr)
When is asking you, type Return
The demo program will run doing some calculations. At the end of processing
the connection is closed
K.
Test nza package
Open RStudio on your client and type
library(nza)
Library nza is loaded, also the prerequisite libraries
Type
demo (nza)
When is asking you, type Return
In the Plots window of the RStudio, we get for this demo some nice graphs
L. Test nzmatrix package
Open RStudio on your client and type
library(nzmatrix)
Library nzmatrix is loaded, also the prerequisite libraries
Type
demo (nzmatrix)
When is asking you, type Return
For this demo we have a nice plot chart
Download