Trio Database System Management

advertisement
Table of contents:
1.0 Understand Trio new database system…………………...............................................................1
Trio System Architecture..........................................................................................................1
Uncertainty-Lineage Database (ULID)…….................................................................................1
Trio: The Trio Query Language……............................................................................................2
2.0 System requirements of installing Trio system.......................................................................3
3.0 Source codes/packages for Trio system……………………………………….........................................3
4.0 Procedure of Trio system installation......................................................................................4
Python……...........................................................................................................................4
Readline…...........................................................................................................................4
Ctypes……............................................................................................................................4
PostgreSQL……....................................................................................................................4
Graphviz………......................................................................................................................5
Easy_install.........................................................................................................................5
PyGreSQL............................................................................................................................5
Pylons……….........................................................................................................................5
PLY…….................................................................................................................................5
PyParsing.............................................................................................................................5
PyDot..................................................................................................................................5
Trio API................................................................................................................................5
5.0 Configuration of Trio System.…………………...............................................................................6
Windows superuser authentication to access PostgreSQL.................................................6
TrioExploerer…………………….………………...............................................................................6
TrioPlus…………………….……………...........................................................................................6
6.0 Experiment Trio DBMS Using TrioExplorer and TrioPlus…………..............................................7
TrioExplorer (Home, Schema, Script, and Help).................................................................7
TrioPlus...............................................................................................................................9
7.0 Trio API Integrated Into Other Python Scripts.........................................................................9
TrioCnx……………..................................................................................................................9
TrioCursor…………………........................................................................................................9
XTuple……………………………………………………………………..........................................................9
Alternative………………….......................................................................................................9
8.0 Trio API and Translator (Python)...........................................................................................10
9.0 Trio Query Language (TriQL) Structure..................................................................................10
Current supported for TriQL.............................................................................................10
Drop index/table...............................................................................................................11
Create Trio table/index.....................................................................................................11
TriQL language (uncertainty, Compute confidences).......................................................12
Table of TriQL contents.....................................................................................................16
10.0 Advantage and Disadvantage of using Trio DBMS ............................................................17
Advantage.........................................................................................................................17
Disadvantage....................................................................................................................17
11.0 Documenting Trio system report/bugs………….....................................................................18
12.0 Written Components………………………….……….......................................................................18
Turn-in documents............................................................................................................18
Demo and presentation....................................................................................................19
13.0 References………………..…….…………………………........................................................................19
Trio Database System Management
1.0 Understanding of Trio new Database System:
Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec,
2006. It is based on an extended relational model called Uncertainty Lineage Database (ULDB) [1], and
also supports Trio’s query language Called TriQL [1]. This new database system technology handles
structured data, uncertainty of data, and data lineage together in a fully integrated manner. In order to
understand deeply and explicitly about this new Trio DBMS different than others, we are going to look at
its architecture/design, ULDB data model, and TriQL.
1.1 Trio System architecture:
M. Mutsuzaki, M. Theobald, etc in [1] summarized in details of Trio-1.0 system architecture, design, and
data structure handling. As seem in Fig 1-System architecture, there are four primary components
comprised as Trio DBMS, including command-line client (TrioPlus), TrioExplorer, Trio API and translator
(Python), and standard relational DBMS (PostgreSQL). Also standard relational DBMS (PostgreSQL)
contained four essential elements – Encoded data table, Lineage table, Trio metadata, and Trio Stored
Procedures. Clustering these four together becomes Uncertain Lineage Database (ULDB) [1] on handling
uncertainty and lineage data during execution. Trio system is more efficient, flexible, and powerful than
regular DBMS because of its goal as mentioned in [1] “The Trio API accepts TriQL queries in addition to
regular SQL, and query results may be x-tuples in the ULDB model as well as regular tuples.”
(Fig 1) System architecture [1]
1.2 Uncertainty-Lineage Database (ULDB)
ULDB, which called Uncertainty-Lineage Database, was mentioned in [2] as “the first database formalism
to integrate uncertainty and lineage”. The solution of handling uncertainty and lineage is a primary key
of developing Trio DBMS. What are uncertainty and lineage? As discovered by O. Benjelloun, A. Das
Sarma , etc in [2], the definition of uncertainty is “…captured by tuples that may include several
alternative possible values, with optional confidence values associated with each alternative.” And
lineage is as “…associates with a data item information about its derivation.” After cleared out the
concepts of ULDB, the next step focuses on ULDB data model which extended the standard SQL
relational model into alternatives, maybe (‘?’) annotations, numerical confidence, and lineage [2]. In the
following discussion and demo, I will use Drives (person, color, car) and Saw (witness, color, car)
uncertainty tables with/no with confidence as an example. A table with blue color in TrioExplor is
uncertainty and orange color is with computing confidence. The source code is listed in Learning Trio
Query Language (TriQL) structure section.
1.2.1 Alternatives
In ULDB, it is consisted with x-tuple (x-relation). Each x-tuple can have one or more alternatives. As
defined in [2], alternatives are presenting uncertainty about the contents of a tuple. For example in Fig 3,
1
Trio Database System Management
persons called Frank and Jimmy are x-tuples which contains one or more alternatives by using
annotation ‘||’. By clicking on the ‘||’, x-tuple will be split into two alternatives for both Frank and
Jimmy in Fig 4. Each alternative is a regular tuple. The reason of combining regular tuples as x-tuple is
because the possible instances have same values/relations in particular fields.
(Fig 2) uncertainty tables
(Fig 3) Drives table
(Fig 4) Drives table
1.2.2 ‘?’ (Maybe) Annotations
‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2]. For example
in Fig 3, there is a ‘?’ on the right side of Billy’s record. In specific by the definition, Billy may or may not
sure what he saw is a blue Honda. If x-tuple has a ‘?’, all of alternatives will have ‘?’.
1.2.3 Numerical Confidences
Numerical confidence also was considered as probability [2]. In Fig 5 is a confidence table. When we
view the tuples of Drivers table, there is a number attached with each x-tuple indicated the confidence
as in Fig 6; Frank has 1.0, Johnny/Jimmy has 1.0, and Billy has 0.9. By looking at Fig 7, there is a detailed
probability for each tuple. The total confidences of x-tuple can be less or equal to 1.0 [2]. In this example,
I inserted the data along with confidences into Drivers table. Optionally, TrioExplorer will compute the
confidences by probability if no confidence available.
(Fig 5) w/confidence tables
(Fig 6) Drives table
(Fig 7) Drives table
1.2.4 Lineage
Lineage, which as defined in [2], is “recorded at the granularity of tuple alternatives: Lineage connects as
x-tuple alternative to other x-tuple alternative.” Specifically, lineage is a function which is used to derive
alternatives for a set of alternatives. For an example, when the SQL query ‘select person from Drives’
was executed in TrioExplorer, the returned result is in Fig 8. The blue arrow in the picture presents the
lineage. By clicking on it, we can see that Billy record was derived from an entire Billy record. By
inserting this function in DBMS, the users can capture derivation of SQL query for better understanding
of data relationship.
2
Trio Database System Management
(Fig 8) Drives table
(Fig 9) Drives table
1.3 TriQL: The Trio Query Language
TriQL was developed specifically for querying Trio DBMS. It contains two major parts: built-in functions
and predicates for querying confidence values and lineage, and regular SQL syntax [2]. As mentioned in
[1], TriQL, which used to query over ULDBs, are translated automatically to SQL queries over the
encoded tables. Here I will not discuss about TriQL query language and structure, but will be covered in
Learning Trio Query Language (TriQL) structure section by using demo.
2.0 System Requirement of Installing Trio System
To know the system requirement before the installation is a very important step. It may the installation
of Trio system much easier and compatible. The requirement mainly concentrates on type of operation
system, version of database server (PostgreSQL), and Python. In [3], the web page describes the
operating systems in which was experimented successfully with Trio System, such as Linux, Mac OS X,
and Win-32 (XP, Vista, and 32-bit Server). In PostgreSQL as database server which supported Trio
system function has various versions available in www.postgresql.com, including 8.2.5, 8.1.10, 8.0.14,
and 7.4.18 [4]. As mentioned in postgreSQL website, it only supports Linux and Win32. Optimal copy for
Trio system is PostgreSQL 8.1.10. The last one is Python API. It can run on windows, Linux/Unix, Mac OS
X, OS/2, and Amiga [5]. In the Trio system installation section, I used Windows XP professional as one of
operating systems to install Trio system.
3.0 Source Codes/packages for Trio System
In http://dbpubs.stanford.edu:8011/doku.php/trio:installation page listed 12 packages for completing
the installation of Trio system. Each package has its own website which contained different versions of
source codes corresponding with installation instructions. This section is to list all the resources for users
to make their lives easier and install properly. (Please use listed versions of packages)
3.1 Listing source codes:
1. Python 2.4 can be downloaded from http://www.python.org/ .
2. Easy_install can be downloaded from http://peak.telecommunity.com/DevCenter/EasyInstall
and the file called ez_setup.py.
3. Readline 1.7.win32 can be downloaded from http://www.python.org/ .
4. ctypes-1.0.2.win32-py2.4 can be download from http://www.python.org/ .
5. PostgreSQL 8.1 can be downloaded from http://www.postgresql.org/ .
6. Graphviz 2.14 is the only version compatible with Trio API. It is available in
http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe .
7. PyGreSQL can be downloaded from http://www.pygresql.org/ .
8. Pylons 0.9.5 can be downloaded from http://pylonshq.com/ .
9. PLY 2.2 can be downloaded from http://www.dabeaz.com/ply/ .
3
Trio Database System Management
10. PyParsing can be downloaded from http://pyparsing.wikispaces.com/ .
11. PyDot can be downloaded from http://code.google.com/p/pydot/downloads/list .
12. Trio API 1.0 can be downloaded from http://infolab.stanford.edu/~theobald/sources/TRIO.zip .
4.0 Procedure of Trio system installation
Since Trio is deployed on top of the above packages, following order below in installing them one by one
will be much easier. Some packages are depending among the others packages. For convenience, most
of the required Python modules (Pylons, PyGreSQL, etc.) can be installed via ''Easy Install'' for Windows
platforms. Most of the procedures below were mainly from [3], but modified as needed.
4.1. Python
o
o
o
Download Python 2.4 windows version (python-2.4.4.msi).
Install Python in C:/Python directory
Set path=c:/Python24; in environment variables
4.2. Readline - download Readline-1.7.win32-py2.4.exe and install into Python directory
4.3. Ctypes – download ctypes-1.0.2.win32-py2.4.exe and install into Python directory
4.4. PostgreSQL
o Download PostgreSQL 8.1 windows version (postgresql-8.1.msi )
o Install PostgreSQL 8.1 as following:
 Language selection (Fig 10) – English
 Introduction screen (Fig 11) – next
 Welcome message and instructions (Fig 12) – next
 Feature selection (Fig 13) – next
 Service installation (Fig 14) – check install a service, input account name
‘postgres’ and password.
 Initdb (Fig 15) – check initialize database cluster, super username and password.
 Procedural languages (Fig 16) – Check PL/pgsql only.
 Contrib modules (Fig 17) – check Admin81 only.
 Next in (Fig 18, 19, 20, 21) to complete the installation.
o Set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation.
(Fig 10) [4]
(Fig 11) [4]
(Fig 12) [4]
4
Trio Database System Management
(Fig 13) [4]
(Fig 14) [4]
(Fig 15) [4]
(Fig 16) [4]
(Fig 17) [4]
(Fig 18) [4]
(Fig 19) [4]
(Fig 20) [4]
(Fig 21) [4]
4.5. Graphviz - Download Graphviz 2.14 version and install to your workstation and set path
C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation
4.6. Easy_install - Download ez_setup.py in C:/ directory
4.7. PyGreSQL – In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to
install components.
4.8. Pylons – In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to
install Pylons. Set path c:\python24\Scripts in environment variables.
4.9. PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install
Ply==2.2.
4.10.PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing.
4.11.PyDot – download the source from website. Access to folder in command line, and then install
manually by running ‘python setup.py install’.
4.12.Trio API
o Download source code in any directory
o Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as
triospi.dll
5
Trio Database System Management
5.0 Configuration of Trio System
5.1. Windows superuser authentication to access PostgreSQL
o Double click in start->all programs->PostgreSQL 8.1->pgadmin III
o Right click on Login Roles to create new login role (see in Fig 22)
 Role name ‘myname’ (as same as windows login account)
 Set password (Password can be any)
 Check all role privileges and click ok.
o Right click on Database to create new database (see in Fig 23)
 Database name ‘myname’ (as same as username)
 Owner is ‘myname’ and click ok
o Initialize Trio schema information (see in Fig 24)
 In Trio-1.0\setup, open setup.py with notepad to comment out the last three
codes and put the following. (After complete the initialization, please change
back to original).
os.system("psql %s %s < setup.sql" % (pgdbname, username))
os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username))
os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username))


(Fig 22)
Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python
setup.py myname myname’ in command line.
Provide password to create schema and done.
(Fig 23)
(Fig 24)
5.2. TrioExplorer [3]
o Make sure PostgreSQL is working.
o Running TrioExplorer – Ensure path ‘c:\python24\Scripts;’ in environment variables. And
double click ‘start_te_server.bat’ under Trio-1.0\explorer.
o At the command line, you are now prompted for an admin user login to PostgreSQL,
which should have been created along with your PostgreSQL installation and which will
be used by TrioExplorer to create new user roles and database instances.
o TrioExplorer should now be reachable from your browser using http://localhost:8080/.
For new users can now press ‘Create a new user’ and create their own Trio login and
database instances, which are then managed by the PostgreSQL server.
6
Trio Database System Management
5.3. TrioPlus [3] – Trioplus is to use to prepare for a new user role and database instances be
installed manually which should be different from the default PostgreSQL instance, and which
requires logging in onto the PostgreSQL server with admin privileges.
o Create new PostgreSQL user role and database instance
 Run ‘createuser demo’
 Run ‘createdb demo’, the name must be the same as username
o Initialize Trio schema information for new user by access as same as windows superuser
authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create
new role’ in Web.
o Connect to new Trio database using the command line clients by running ‘python
trioplus.py –u demo –d demo –p’
6.0 Experiment Trio DBMS Using TrioExplorer And TrioPlus
6.1 TrioExplorer:
Fig 25 described the interface of
TrioExploer in home tab after login in
using login username and password. On
the left side of the web page listed all
symbols which used to show the types
of tables/records. In the body of the
page, a text-field area uses to execute
TriQL. After press ‘execute’, the
information will show up under with
correct result.
(Fig 25)
Fig 26 is the schema tab in TrioExplorer.
After created the tables of database, the
diagram described the relations of
tables, type of tables, and attributes. On
the left side also listed the symbols and
meanings of the diagram.
(Fig 26)
7
Trio Database System Management
Fig 27 is a sample tab. It uses to load the
sample script which was provided by the
application for testing purpose. After
load the sample script data, the
application will automatically execute
the queries in DBMS. Also a copy of
sample script will list on the body of the
page.
(Fig 27)
Fig 28 is a script tab in TrioExplorer
which uses to load a script from a file in
your workstation to the application to
be executed. After loaded successfully,
you can click on the script to put a single
line of script to the text-field to be
executed in DBMS.
(Fig 28)
Fig 29 is a help tab which helps users for
common problems such as how to use
all tabs, how to create database, and
more.
(Fig 29)
8
Trio Database System Management
6.2 TrioPlus
As seem in Fig 30, by accessing to Trio1.0/ directory at the command line, just
type ‘python trioplus.py –u myname –d
myname –p’ to connect to Postgres
server. Please aware that myname
account has to be created in Trio DBMS
first. After established the connection
with postgreSQL, you can execute any
TriQL query statement in ‘>’ as same as
in the text-field of TrioExplorer.
(Fig 30)
7.0 Trio API Integrated Into Other Python Scripts
Trio API contains many methods in the classes Triodb.py and xtyple.py under Trio-1.0 directory for
integrating into other Python scripts. By importing both classes to Python scripts, it provides additional
capability to handle tuple alternatives, computation of confidence, and lineage. As seem closely in the
classes, the Trio API methods were designed to work very similar as Python DB-API interface [3]. There
are several Trio API methods described detailed below was from [3].
7.1 TrioCnx
7.1.1. TrioCnx(pgdb) – This constructor method creates a new Trio connection from a given
PyGreSQL connection pgdb, the default Python DB-API.
7.1.2. cursor() – return a new TrioCursor object for the current connection.
7.1.3. commint() – commits the current transaction.
7.1.4. rollback() – performs a rollback for the current transaction.
7.1.5. close() – closes the Trio connection (and the underlying pgdb connection)
7.2 TrioCursor
7.2.1. execute(triql) - Executes a TriQL statement triql for this cursor object.
7.2.2. fetchone() - Fetches a single XTuple object from the current cursor position.
7.2.3. fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor
position.
7.3 XTuple
7.3.1. len() - Returns the number of Alternative objects contained in this XTuple object.
7.3.2. getAlternative(idx) - Returns the Alternative object at the designated index idx.
7.3.3. getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its
Alternative objects' confidence values.
7.3.4. getQuestionMark() - Returns whether this XTuple object has a question mark or not.
7.4 Alternative
7.4.1 getLineage() – returns a list of immediate lineage information as (source-table, source-aid)
pairs of this alternative.
7.4.2 traceLineage() – performs a transitive lineage traversal for this alternative back to the base
data
7.4.3 getConfidence() - Returns the confidence value (if any) of this alternative.
9
Trio Database System Management
7.4.4 computeConfidence() - Computes the confidence value of this alternative based on the
traceLineage() function.
8.0 Trio API And Translator (Python)
Trio API and translator (Python) is one of important components in Trio system. Every Trio query
language (TriQL), entered from either TrioExplorer or TrioPlus by users, will be translated into encoded
data table, Lineage table, Trio metadata, and/or Trio Stored Procedures of ULDB, and be executed in
relational database. Node.py in Trio-1.0 directory and lexer.py in Trio-1.0\trioparser directory contain
classes, methods, TriQL keywords initiation for supporting and validating TriQL queries through Trio API.
TriQL built-in functions
TriQL keywords
Aggregate(), Alternative(), As(), Avg(),
BinaryExpression(), Brackets(), Cascade(),
ColumnList(), Column(), Command(),
ComputeConfidences(), Conf(), ConfUniform(),
ConfScaled(), Count(), CreateIndex(),
CreateTableAs(), CreateTable(),
CreateTempTableAs(), CreateTempTable(),
DataType(), Dot(), DropIndex(), DropTableList(),
DropTable(), Eavg(), Ecount(), EcountStar(), Emax(),
Emin(), Empty(), Esum(), FromClause(),
GroupAlts(), GroupbyClause(), GroupByKey(),
HavingClause(), HorizontalSelect(), IdentifierList(),
InsertList(), Insert(), InsertSpec(), Lineage(),
Literal(), Lsum(), Max(), Maybe(), Maybe(), Min(),
OrderBy(), Question(), SelectClause(), Select(),
SelectOptions(), SetOperator(), Star(), Sum(),
UnaryExpression(), UncertainSet(), UniformTable(),
ViewTable(), WhereClause(), WithConfidences()
'all', 'and', 'any', 'as', 'avg', 'by', 'cascade',
‘compute', 'conf', 'confidences', 'count', 'create',
'distinct', 'drop', 'eavg', 'ecount', 'emax', 'emin',
'esum', 'except', 'exists', ‘flatten', 'float', ‘float4',
'float8', 'from', 'group', 'groupalts', 'having',
'in', 'index', 'insert', 'int', 'int4', 'int8', 'intersect',
'into', 'is', 'like', 'lineage', #'lavg', #'lcount', #'lmax',
#'lmin', #'lsum', 'max', 'maybe', 'min', 'noconf',
'nolineage', 'nomaybe', 'not', 'null', 'on', 'or',
'order', 'scaled', 'scaledbyexp', 'select', 'sum',
'table', 'temporary', 'trio', 'trio_aid', 'trio_xid',
'uncertain', 'uniform', 'union', 'merged', 'values',
'varchar', 'view', 'where', 'with'
9.0 Trio Query Language (TriQL) Structure
As mentioned in [3], “Trio is based on the ULDB model, an extension to the relational model that adds
both uncertainty (U) and lineage (L) as first-class concepts.” Therefore, Trio Query Lanaguage (TriQL) is
designed for querying and updating ULDB database using Encoded data table, Lineage table, Trio
metadata, and Trio Stored Procedures. Also discussed earily, ULDB is extended over the standard SQL.
So TriQL contains not only built-in functions to handle uncertainty, query/compute confidence values,
derive data lineage, but also regular SQL queries.
9.1 Current supported for TriQL
Supported DDL and Insert Commands


Create Table
Create Trio Table
Supported Subset of TriQL Query Language


Select-Project-Join queries
Create Table T as <query>
10
Trio Database System Management





Create Index
Drop Table Including “Cascade” option to
drop derived tables
Drop Index
Insert Into T Values <data>
o With or without alternatives,
confidence values, and ?
o T must be a base table
Insert Into T <TriQL query>
o T must be a base table













Select Distinct
Union, Intersect, or Except of two
subqueries (duplicate-eliminating)
Union All of two subqueries
Merged, Flatten, and GroupAlts
Horizontal subqueries in Where clause
o Including shortcuts and
aggregation/group-by/having
Horizontal subqueries in Select clause
o Including shortcuts and aggregation
but not group-by/having
Conf() function with any number of tables
or Conf(*)
o except for Conf(*) in the SELECT
clause in conjunction with DISTINCT
or MERGED
Maybe() predicate
“Uniform <table-name>” in From clause
“as conf” for query-defined result
confidence values
o Including “Uniform as conf” and
“Scaled as conf”
“Compute Confidences” at end of query
Lineage() predicate
o Not in horizontal subqueries
o “=⇒” abbreviation allowed
NoLineage, NoConf, and NoMaybe
Based on the sample of TriQL in http://infolab.stanford.edu/~widom/triql.html, I created the TriQL
tables/indexes and inserted the uncertainty data and confidences for demo.
9.2 Drop index/table – TriQL can use ‘drop index/table’ statemetn to delete the table as same as the
standard SQL statement over the DBMS.
drop index DRIVES_INDEX;
drop table Drives;
drop index SAW_INDEX;
drop table Saw;
9.3 Create Trio table/index – To create TriQL table for uncertainty, the word ‘Trio’ must involve in the
create statement ‘create trio…’ as below. Also a ‘uncertain()’ funciton in the same statement helps to
define which field inside the table will be considerred as uncertainty. If the tail of the statement has
‘with confidences’, it means the table can be insertted with confidences and compute confidences
during the query execution. ‘[…|…]’ means x-tuple. ‘?’ as mentioned above is alternative. ‘:number’ is
the confidence.
---------------------------------------------------------------------- Creating tables and indexes, inserting data
--------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car));
create index SAW_INDEX on Saw(color, car);
11
Trio Database System Management
insert into Saw values [('Amy','blue','Honda') | ('Amy','red','Toyota')] ?;
insert into Saw values [('Betty','green','Mazda') | ('Betty','green','Toyota') | ('Betty','green',NULL)];
insert into Saw values ('Cathy','red','Acura') ?;
insert into Saw values [('Diane','red','Toyota') | ('Diane','blue','Toyota')];
create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color,
car));
create index DRIVES_INDEX on Drives(person,color, car);
insert into Drives values [('Frank','red','Toyota') | ('Frank','blue','Toyota')];
insert into Drives values ('Billy','blue','Honda') ?;
insert into Drives values [('Jimmy','green','Mazda') | ('Johnny','green','Mazda')];
--------------------------------------------------------------------------------------------------------------- With confidence value
--------------------------------------------------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car)) with
confidences;
create index SAW_INDEX on Saw(color, car);
insert into Saw values [('Amy','blue','Honda'):0.4 | ('Amy','red','Toyota'):0.3];
insert into Saw values [('Betty','green','Mazda'):0.5 | ('Betty','green','Toyota'):0.2 |
('Betty','green',NULL):0.3];
insert into Saw values [('Cathy','red','Acura'):0.6];
insert into Saw values [('Diane','red','Toyota'):0.2 | ('Diane','blue','Toyota'):0.8];
create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color,
car)) with confidences;
create index DRIVES_INDEX on Drives(person,color, car);
insert into Drives values [('Frank','red','Toyota'):0.7 | ('Frank','blue','Toyota'):0.3];
insert into Drives values [('Billy','blue','Honda'):0.9];
insert into Drives values [('Jimmy','green','Mazda'):0.4 | ('Johnny','green','Mazda'):0.6];
9.4 TriQL language
9.4.1 uncertainty
To create Saw (witness, color, car) and Drives (person,
color, car) in the DBMS without confidences, basically
copy the first part of above script into the text-field
area under home tab and press ‘execute’. The
application will execute and create two tables, Drives
and Saw as seem in Fig 31. All the data was
successfully inserted into the table. The green color of
the table means uncertainty.
(Fig 31)
12
Trio Database System Management
Selection - In Fig 32 shows the results which selectted
all reconrds from Saw table if its car is toyato while
executing the TriQL ‘select * from Saw where car =
‘Toyato’. In the table, the blue arrow on the left side
presents the lineage, and right side with ? presents
the alternative. The 3D in the last recod contains two
tuples (x-tuple).
(Fig 32)
Projection – Fig 33 shows the results of executing the
query ‘Select color From Saw Where car = 'Mazda' or
car = 'Toyota'. this query will find the colors of sighted
Mazdas and Toyotas.
(Fig 33)
Projection (Merged) – Fig 34 shows the results of
executing the query ‘Select Merged color From Saw
Where car = 'Mazda' or car = 'Toyota'. This query will
find the colors of sighted Mazdas and Toyotas.
Merged in query statement will eliminate the
duplicated tuples in horizontal.
(Fig 34)
Join – Fig 35 shows the results of executing the query
‘Select S.witness, D.person as suspect, D.color, D.car
From Drives D, Saw S
Where D.color = S.color and D.car = S.car’. the query
joins Saw and Drives table based on the same color.
Using merged fuction in query statement as ‘Select
Merged D.person as suspect, D.color
From Drives D, Saw S
Where D.color = S.color’ will help to eliminate the
duplicated tuples.
(Fig 35)
13
Trio Database System Management
Duplicate-Elimination (Distinct) – Fig 36 shows the
results of executing query ‘Select Distinct color, car
from Saw’. The query scan the entire Saw and return
color and car for each tuple. By using distince funciton
will eleminiate any vertical duplicated tuples.
(Fig 36)
Flatten – Fig 37 shows the results of executing query
‘Select Flatten * From Saw’. Flatten is used to turn
tuples with alternative values into regular tuples. The
query converts table Saw into a list of sightings [3].
(Fig 37)
GroupAlts - Fig 38 shows the reuslts of executing the
TriQL ‘Select GroupAlts(color) * From Saw’. The query
takes a list of alternatives, and reorganizes the Saw
table so the data is ‘keyed’ on color and uncertainty
about car and witness [3]. The GroupAlts funciton is
used to create or restructure alternative values.
(Fig 38)
Horizontal Subqueries: The [ ] in the Where Clause –
Fig 39 Shows the results of executing the query ‘Select
* From Saw
Where 2 <= [Select Count(*) From Saw Where car =
'Toyota']’. The ‘[ ]’ inside the statement was treated as
a horizontal subqueries. The query finds all tuples with
at least two alternatives involving Toyotas.
The [ ] in the select clause as ‘Select Merged witness,
[Count(Distinct color)] as #colors
From Saw’
(Fig 39)
14
Trio Database System Management
Syntactic Shortcuts in [ ] for implicit table-list – Fig 40
shows the results of executing the query
‘Select * From Saw
Where color <=All [color]’. Color is one the attributes
in table Saw. The query is doing lexicographic color
comparisons. The shortcut in [ ] also can use for
‘select * ‘ as
‘Select * From Saw Where Exists [car = 'Toyota']’.
(Fig 40)
9.4.2 Compute Confidences
To create Saw (witness, color, car) and Drives (person,
color, car) in the DBMS with confidences, copy the
second part of above script into text-field under home
tab and press ‘execute’. The application will execute
and create two tables, Drives and Saw with
confidences as seem in Fig 41. The table color changed
to orange, compute confidences. All the data was
successfully inserted into the table.
(Fig 41)
Built-in Function Conf() – Fig 42 shows the results of
executing the query ‘Select S.witness, D.person as
suspect, D.color, D.car
From Drives D, Saw S
Where D.color = S.color and D.car = S.car
And Conf(D) > 0.3 And Conf(S) > 0.3’. The conf() takes
a single table name or variable from the ‘From’ clause
as a parameter, and returns the confidence of the
current alternative being evaluated from that table.
The query join Drives and Saw tables to generate
supects if confidence > 0.3.
(Fig 42)
Conf() for Multi-table: ‘Select S.witness, D.person as
suspect, D.color, D.car
From Drives D, Saw S
Where D.color = S.color and D.car = S.car
And Conf(S) > 0.3 And Conf(*) > 0.2’ condition for
result alternatives with confidence > 0.2.
15
Trio Database System Management
Built-in function Maybe() – Fig 43 shows the results of
executing query ‘Select S.witness, D.person as
suspect, D.color, D.car From Drives D, Saw SWhere
D.color = S.color and D.car = S.car And (Not Maybe(D))
And (Not Maybe(S))’. Maybe() takes a table name or
variable from the ‘From’ clause as a parameter, and
returns true if and only if the current alternative being
evaluated from that table comes from a tuple with a
‘?’.
(Fig 43)
Uniform result confidence – Fig 44 shows the results
of executing the query ‘Select *, uniform as conf From
Saw’. The uniform used with ‘as conf’ assigns
confidence values to a tuple as, if the tuple has n
alternatives and no ‘?’, assign confidence 1/n to each
alternative, or if the tuple has n alternatives and a ‘?’,
assign confidence 1/(n+1) to each alternative. So the
query computes and returns the confidences.
(Fig 44)
(Fig 45)
Scaled result confidence – Fig 45 shows the results of
executing the query ‘Select *, scaled as conf From
Saw’. scaled used with ‘as conf’ assigns confidence
values to a tuple, If the tuple has no confidence values
and n alternatives, assign confidence 1/n to each
alternative, or If the tuple has confidence values that
sum to s, assign confidence value c/s to each
alternative, where c is the existing confidence value
for that alternative. So the query computes and
returns the confidences.
9.5 Table of all possible TriQL contents – in [3] about the TriQL Query Language (website
http://infolab.stanford.edu/~widom/triql.html#options ), it listed all built-in functions for querying
ULDB. For convinient, I collected all and put them in a table below.
ULDBs
SQL over ULDBs
Flatten and GroupAlts
Horizontal subqeries: The [ ] construct
Uncertian attibutes, maybe annotations and
confidence values
Selection, projection, join, subqueries, duplicateelimination, grouping and aggregation, aggregate
variants, set operators, order by
Flatten is used to turn tuples with alternative
values into regular tuples, while GroupAlts is used
to create or restructure alternative values
[ ] in the where clause, [ ] with joins, Syntactics
shortcuts in [ ], [ ] in the select clause, [ ] with SelfJoins
16
Trio Database System Management
Builit-in Functions Conf() and Maybe()
Result confidences
Built-in Predicate Lineage()
Options Nolineage, Noconf, and NoMaybe
Data modificaiton
Multi-table conf()
Result confidence evaluation, uniform and scaled
result confidences, On-Demand confidence
computation
The Lineage() predicate lets queries filter joined
tuples based on whether they are related via
lineage
Indicate lineage, confidence values, and/or?'s
should be omitted from query results
Insert statement, delete statement, update
statement
10.0 Advantage and Disadvantage of using Trio DBMS
10.1 Advantage
10.1.1 Open source and free support for any non-benefit users to experience new Trio DBMS
Trio DBMS was created by Stanford University at Dec, 2006 and updated very often with new
plug-in and Trio version. If uncounted any problems, the supporter in Stanford University, like
Martin, Theobald, will provide solutions. Also there are many resources available for users to
better understand about Trio DMBS in term of design/architecture, components, query
structure, and more.
10.1.2 Advanced components in relational DBMS
Current relational DBMSs, like Oracle, SQL, DB2, and more, can handle well-structure data easily,
but not any uncertainty data and lineage. Hence, the Trio DBMS is more advanced. In Stanford
Trio project, the ULDB was successfully built on top of postgreSQL. I believe later, it can be
implemented on any relational DBMS as one of plug-in.
10.1.3 Computing confidences
Computing Confidences is one of function in Trio System. It is very important for calculating the
probability of each tuple in total. From the marketing perspective, the probability gives the
better view of users to make decision, data analysis, and statistics.
10.1.4 Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent data [6]
Persistent and convenient is because all data stored in simple tables (“relations”) and queries
and updates via simple but powerful declarative language (SQL). In data transactions, it is safer,
and also allows multi-user access. The DBMS can storage and indexing structures which makes it
massive and efficient.
10.2 Disadvantage
10.2.1 Time cost for query
In Trio DBMS query structure, the time costing to query uncertainty, lineage, and confidences,
or regular SQL is more than regular relational DBMS because Trio needs to create super set of
tuples which requires several extra steps. As below example, ULDB takes four steps and
relational database takes only two steps. Therefore, relational database query is much faster
than ULDB.
Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a
comparison between relational database and ULDB.
17
Trio Database System Management
Over standard relational database:
For each tuple in cross-product of X1, X2, ..., Xn
1. Evaluate the predicate
2. If true, project attr-list to create result tuple
Over ULDB:
For each tuple in cross-product of X1, X2, ..., Xn
1. Create “super tuple” T from all combinations of alternatives
2. Evaluate predicate on each alternative in T ; keep only the true ones
3. Project attr-list on each alternative to create result tuple
4. Details: ‘?’, lineage, confidences
10.2.2 Dependency
Trio DBMS is not a completed DBMS but depending on any relational DBMS, like postgreSQL.
The TrioExplorer was built in using python and other packages. Later some packages was
updated to new version, for instance, graphviz 2.16, it is not compatible with TrioExplorer which
will cause problems. Therefore, in order to keep Trio DBMS running, it requires efforts to
maintain, reconfigure, and update.
10.2.3 On development stage - Trio project has not been developed completely with all functionalities,
such as TrioExplorer for DB admonition, DB security, and more.
11.0 Documenting Trio System report/bugs
During the experiment of Stanford Trio project, I went through the Trio instalation, TrioExplorer,
TrioPlus and TriQL. There are several places with mistakes/not correct/imcomplete.
11.1.The install instruction in website http://dbpubs.stanford.edu:8011/doku.php/trio:installation ,
indicated unclearly the version of Graphviz for Trio system. In Graphviz website
http://www.graphviz.org/ only has version 2.16 but not compatable except version 2.14.
Graphviz version 2.14 is available for download in
http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe .
11.2.The windows authentication supperuser needs to be created first in the PostgreSQL in order to
connect to database. After established the connection, TrioExplorer and TrioPlus can use the
supperuser’s login and password as windows authentiction to access to database system.
However, it doesn’t mention at all in the installation procedure on how to create this typle of
new user. The only way to solve it is to use PostgreSQL->pgadmin III manually.
11.3.After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup
directory in order to run ‘python setup.py –u myname –d myname –p’.
11.4.TriQL query statements in http://infolab.stanford.edu/~widom/triql.html#options, there are
many samples queries not working properly as desired.
12.0 Written Components
12.1 Turn-in Documents
Turn in Special interesting activity task plan summary, Activity deliverables, Standard, minimal
documentation of hands-on work, and Electronic copy of activity.
18
Trio Database System Management
12.2 Demo and Presentation
12.2.1 Presentation of Trio database system architecture
Brief description of Trio database system architecture, ULDB data model, Python API, TriQL
query structure, features, installation step-by-step instruction, Python Sever Installation stepby-step instruction, and more
12.2.2 Demo
Launch Python server, PostgreSQL, TrioExplorer, TrioPlus, TriQL (Create tables, indexes,
insert, select, update, more query statement).
12.3 Reflective written component
After spent some many hourses experimenting Trio project, I learned about python (server,
coding, and executing programs), PostgreSQL (how different than other DBMS, installation,
etc), Trio (installation, architecture, TriQL, and ULDB, TrioExplorer, TrioPlus), and more.
during the experiment, I found out several things are very important to complete my goal
such as support, resources, environment, and scale. The support means where to get help if
encouted problems. Resources can provide a bais picture of the entire system, so I know
what I am working on. Environment is where to experienment the project and what the
operating system is, what the database server is, and more. The last one is the scale. To make
sure to complete the project on time, I have to know how big the project is in order to
manage my time. I felt that this is a greate experince to establish skill and ability in term of
trouble shooting, problem solving, and adopting new technology.
13.0 References
[1] M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R.
Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS.
Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07),
Pacific Grove, California, January 2007. Demonstration description.
[2] O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio
System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16,
March 2006.
[3] Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on
November, 18, 2007 from http://infolab.stanford.edu/trio/ .
[4] PostgreSQL. Retrieved on November, 20, 2007 from http://www.postgresql.org/.
[5] Python. Retrieved on November, 15, 2007 from http://www.python.org/.
[6] Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07.
Ppt.
19
Download