plastic - Database Systems Lab

advertisement
PLASTIC: Installation and Execution Notes
Plastic is a value-addition tool for reducing query optimization overheads through
a technique of plan recycling. It is under development at the Database Systems
Lab, Indian Institute of Science, located in Bangalore, India. The technical
details of Plastic are available here. The following notes are meant to guide the
installation and execution of Plastic.
Installation for DB2 on Windows:
The following installation procedure is known to work with DB2 V8.1 on WindowsXP
Version 2002, Service Pack 1.
Note: The user account in which Plastic is set up must have administrative privileges.
A) Steps to initialize the System
1.
Install DB2 with complete support for Java and documentation.
Download Sites:
To download DB2 V8.1 for windows,
https://www6.software.ibm.com/dl/db2udbdl/db2udbdli?S_TACT=&S_CMP=&S_PKG=d8pew32ww&x=50&y=8
For latest version of Java and its documentation,
http://java.sun.com/j2se/1.4.2/download.html
For an introduction to DB2, go through the DB2 manuals (or use the online
db2help command). The manuals are available (in html) in the doc directory of
the installation.
2.
Create a database say 'tpch', which will contain the database tables.
For loading tables into the database, use the import utility of DB2. To ensure the
database loads consistently, use the 'commitcount' qualifier while importing. The
command template is given below:
IMPORT FROM <filename> OF <filetype> COMMITCOUNT <value>
INSERT IN <table-name>.
Sample command to load data
IMPORT FROM part.tbl OF del COMMITCOUNT 100000 INSERT INTO part
Note:
While loading data into tables, it is preferable to not have specified the index,
primary key, and foreign key relationships since it might slow down the insertion
performance. These constraints can be added later using the 'alter table' command
after the data is loaded into tables.
3.
Create the explain plan tables.
These are the tables where information of selected plan is stored by optimizer.
To create these tables, after connecting to the 'tpch' database, run the following
command:
db2 –tf sqllib\misc\EXPLAIN.DDL.
4.
Plastic assumes that detailed statistics are available for all the tables present in the
database. These summaries can be created using the following command:
RUNSTATS ON TABLE <table_name> WITH DISTRIBUTION AND
DETAILED INDEXES ALL SHRLEVEL CHANGE
Example:
RUNSTATS ON TABLE customer WITH DISTRIBUTION AND DETAILED
INDEXES ALL SHRLEVEL CHANGE
Here,
WITH DISTRIBUTION: Specifies that distribution statistics are requested.
SHRLEVEL CHANGE: Specifies that other users can read from and write to the
table while statistics are calculated.
AND INDEXES ALL: Update statistics on both the table and its existing indexes.
5.
Additional settings (optional)
DB2 UPDATE DATABASE CFG FOR tpch USING SORTHEAP 512
This parameter increases the default memory allocated for sorts. Thereby some
additional plans that have joins which require more sort memory are also
considered for query optimization.
DB2SET DB2_HASH_JOIN=Y
This parameter enables the Hash Join.
DB2SET DB2_EXTENDED_OPTIMIZATION=ON
Specifies whether or not the query optimizer uses optimization extensions to
improve query performance.
B) Execution
1.
Required Configuration
a. Java version "1.4.X" Java(TM) 2 Runtime Environment or higher
The code was tested on Java version "1.4.2_05" on WindowsXP Version2002,
Service Pack 1.
b. Set the CLASSPATH variable.
Your CLASSPATH should contain the following packages.
Java Installation Packages:
Include jar files that are part of JRE System Library. They are present in
directories j2re1.4.2_05\lib and j2re1.4.2_05\lib\ext.
Jar files present in directory 'packages':
db2java.zip, classes111.zip, sql4j.jar, gnu-regexp.jar.
c. Include lib directory of DB2 in LD_LIBRARY_PATH variable.
d. As an optimization, Plastic uses the C-4.5 Decision Tree Classifier.
If you want to use this optimization, install the weka-3-4 package, and also add
the files weka.jar and weka-src.jar to the CLASSPATH.
Weka is a Java-based collection of machine learning algorithms for solving realworld data mining problems.
Download Site:
http://prdownloads.sourceforge.net/weka/weka-3-4-2.exe?download
2.
Running Plastic.
Unzip the Plastic.zip file into any convenient directory. You are now ready to run
Plastic. Here is the commands summary.
1. compile.bat
This batch file is used to compile 'Plastic'.
2. runplastic.bat
This batch file is used to run 'Plastic'.
C) Code Design
In the Plastic package, the following contents are present:
Directory PLASMA contains source code.
Directory help contains help for understanding the interface components of Plastic.
Directory sql4j contains code for sql4j parser.
Directory package contains external java packages.
Directory dat contains the Settings.dat file which stores the settings of the
database connection persistently.
Directory data contains the data to generate a decision tree classifier.
Directory input contains some sample benchmark-queries (currently of SPJ types).
Directory pic contains the logo of Database Systems Lab (DSL).
Installation for Oracle on Windows:
The following procedure is known to work with Oracle 9i on WindowsXP Version 2002,
Service Pack 1.
Note: The user account in which Plastic is set up must have administrative privileges.
A) Steps to initialize the System
1.
Install Oracle with complete support for Java and documentation.
Download Sites:
To download Oracle 9i for windows,
http://www.oracle.com/technology/software/htdocs/devlic.html?http://otn.
oracle.com/software/products/oracle9i/htdocs/winsoft.html
For Latest version of Java and its Documentation,
http://java.sun.com/j2se/1.4.2/download.html
2.
Create a database say 'tpch', which will contain the generated tables.
For loading tables into the database, Oracle supports an import utility called
sqlldr. For using sqlldr, you should create a file with the following contents. The
file should have a ‘.ctl’ extension.
load data INFILE 'part.tbl' INTO TABLE part
FIELDS TERMINATED BY '|'
(P_PARTKEY, P_NAME, P_MFGR, P_BRAND, P_TYPE, P_SIZE,
P_CONTAINER, P_RETAILPRICE, P_COMMENT )
Using sqlldr:
sqlldr <user_name>/<password> control=part
control = part
This parameter tells that the control file name is part.ctl. The default control file
extension is .ctl, so the parameter needs to specify only the file name in this case.
Note:
While loading data into tables, it is preferable to not have specified the index,
primary key, and foreign key relationships since it might slow down the insertion
performance. These constraints can be added later using the 'alter table' command
after the data is loaded into tables.
3.
Create the explain plan table.
This is the table where information of selected plan is stored by optimizer.
See the information about Oracle plan table at
http://www.adp-gmbh.ch/ora/explainplan.html.
Example:
After connecting to 'tpch' database, execute
sqlplus <User-name>/<Password> @?\rdbms\admin\utlxplan.sql
4.
Plastic assumes that detailed statistics are available for all the tables present in the
database. These summaries can be created using the following command:
EXEC dbms_stats.gather_schema_stats ( ownname => '<owner_name>',
method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO',
cascade=>TRUE);
5.
Additional settings (optional)
ALTER SYSTEM SET optimizer_max_permutations = 80000 scope=spfile
This is used to set the maximum number of permutations of joins to check by the
optimizer for a given query to 80000.
B) Execution
1.
Required Configuration
a. Java version "1.4.X" Java(TM) 2 Runtime Environment or higher
The code was tested on Java version "1.4.2_05" on WindowsXP Version2002,
Service Pack 1.
b. Set the CLASSPATH variable.
Your CLASSPATH should contain the following packages.
Java Installation Packages:
Include jar files that are part of JRE System Library. They are present in
directories j2re1.4.2_05\lib and j2re1.4.2_05\lib\ext.
Jar files present in directory 'packages':
db2java.zip, classes111.zip, sql4j.jar, gnu-regexp.jar.
c. Include lib directory of Oracle in LD_LIBRARY_PATH variable.
d. As an optimization, Plastic uses the C-4.5 Decision Tree Classifier.
If you want to use this optimization, install the weka-3-4 package, and also add
the files weka.jar and weka-src.jar to the CLASSPATH.
Weka is a Java-based collection of machine learning algorithms for solving realworld data mining problems.
Download Site:
http://prdownloads.sourceforge.net/weka/weka-3-4-2.exe?download
2.
Running Plastic
Unzip the Plastic.zip file into any convenient directory. You are now ready to run
Plastic. Here is the commands summary.
1. compile.bat
This batch file is used to compile 'Plastic'.
2. runplastic.bat
This batch file is used to run 'Plastic'.
C) Code Design
In the Plastic package, the following contents are present:
Directory PLASMA contains source code.
Directory help contains help for understanding the interface components of Plastic.
Directory sql4j contains code for sql4j parser.
Directory package contains external java packages.
Directory dat contains the Settings.dat file which stores the settings of the
database connection persistently.
Directory data contains the data to generate a decision tree classifier.
Directory input contains some sample benchmark-queries (currently of SPJ types).
Directory pic contains the logo of Database Systems Lab (DSL).
Download