Table of contents: 1.0 Understand Trio new database system…………………...............................................................1 Trio System Architecture..........................................................................................................1 Uncertainty-Lineage Database (ULID)…….................................................................................1 Trio: The Trio Query Language……............................................................................................2 2.0 System requirements of installing Trio system.......................................................................3 3.0 Source codes/packages for Trio system……………………………………….........................................3 4.0 Procedure of Trio system installation......................................................................................4 Python……...........................................................................................................................4 Readline…...........................................................................................................................4 Ctypes……............................................................................................................................4 PostgreSQL……....................................................................................................................4 Graphviz………......................................................................................................................5 Easy_install.........................................................................................................................5 PyGreSQL............................................................................................................................5 Pylons……….........................................................................................................................5 PLY…….................................................................................................................................5 PyParsing.............................................................................................................................5 PyDot..................................................................................................................................5 Trio API................................................................................................................................5 5.0 Configuration of Trio System.…………………...............................................................................6 Windows superuser authentication to access PostgreSQL.................................................6 TrioExploerer…………………….………………...............................................................................6 TrioPlus…………………….……………...........................................................................................6 6.0 Experiment Trio DBMS Using TrioExplorer and TrioPlus…………..............................................7 TrioExplorer (Home, Schema, Script, and Help).................................................................7 TrioPlus...............................................................................................................................9 7.0 Trio API Integrated Into Other Python Scripts.........................................................................9 TrioCnx……………..................................................................................................................9 TrioCursor…………………........................................................................................................9 XTuple……………………………………………………………………..........................................................9 Alternative………………….......................................................................................................9 8.0 Trio API and Translator (Python)...........................................................................................10 9.0 Trio Query Language (TriQL) Structure..................................................................................10 Current supported for TriQL.............................................................................................10 Drop index/table...............................................................................................................11 Create Trio table/index.....................................................................................................11 TriQL language (uncertainty, Compute confidences).......................................................12 Table of TriQL contents.....................................................................................................16 10.0 Advantage and Disadvantage of using Trio DBMS ............................................................17 Advantage.........................................................................................................................17 Disadvantage....................................................................................................................17 11.0 Documenting Trio system report/bugs………….....................................................................18 12.0 Written Components………………………….……….......................................................................18 Turn-in documents............................................................................................................18 Demo and presentation....................................................................................................19 13.0 References………………..…….…………………………........................................................................19 Trio Database System Management 1.0 Understanding of Trio new Database System: Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, 2006. It is based on an extended relational model called Uncertainty Lineage Database (ULDB) [1], and also supports Trio’s query language Called TriQL [1]. This new database system technology handles structured data, uncertainty of data, and data lineage together in a fully integrated manner. In order to understand deeply and explicitly about this new Trio DBMS different than others, we are going to look at its architecture/design, ULDB data model, and TriQL. 1.1 Trio System architecture: M. Mutsuzaki, M. Theobald, etc in [1] summarized in details of Trio-1.0 system architecture, design, and data structure handling. As seem in Fig 1-System architecture, there are four primary components comprised as Trio DBMS, including command-line client (TrioPlus), TrioExplorer, Trio API and translator (Python), and standard relational DBMS (PostgreSQL). Also standard relational DBMS (PostgreSQL) contained four essential elements – Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Clustering these four together becomes Uncertain Lineage Database (ULDB) [1] on handling uncertainty and lineage data during execution. Trio system is more efficient, flexible, and powerful than regular DBMS because of its goal as mentioned in [1] “The Trio API accepts TriQL queries in addition to regular SQL, and query results may be x-tuples in the ULDB model as well as regular tuples.” (Fig 1) System architecture [1] 1.2 Uncertainty-Lineage Database (ULDB) ULDB, which called Uncertainty-Lineage Database, was mentioned in [2] as “the first database formalism to integrate uncertainty and lineage”. The solution of handling uncertainty and lineage is a primary key of developing Trio DBMS. What are uncertainty and lineage? As discovered by O. Benjelloun, A. Das Sarma , etc in [2], the definition of uncertainty is “…captured by tuples that may include several alternative possible values, with optional confidence values associated with each alternative.” And lineage is as “…associates with a data item information about its derivation.” After cleared out the concepts of ULDB, the next step focuses on ULDB data model which extended the standard SQL relational model into alternatives, maybe (‘?’) annotations, numerical confidence, and lineage [2]. In the following discussion and demo, I will use Drives (person, color, car) and Saw (witness, color, car) uncertainty tables with/no with confidence as an example. A table with blue color in TrioExplor is uncertainty and orange color is with computing confidence. The source code is listed in Learning Trio Query Language (TriQL) structure section. 1.2.1 Alternatives In ULDB, it is consisted with x-tuple (x-relation). Each x-tuple can have one or more alternatives. As defined in [2], alternatives are presenting uncertainty about the contents of a tuple. For example in Fig 3, 1 Trio Database System Management persons called Frank and Jimmy are x-tuples which contains one or more alternatives by using annotation ‘||’. By clicking on the ‘||’, x-tuple will be split into two alternatives for both Frank and Jimmy in Fig 4. Each alternative is a regular tuple. The reason of combining regular tuples as x-tuple is because the possible instances have same values/relations in particular fields. (Fig 2) uncertainty tables (Fig 3) Drives table (Fig 4) Drives table 1.2.2 ‘?’ (Maybe) Annotations ‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2]. For example in Fig 3, there is a ‘?’ on the right side of Billy’s record. In specific by the definition, Billy may or may not sure what he saw is a blue Honda. If x-tuple has a ‘?’, all of alternatives will have ‘?’. 1.2.3 Numerical Confidences Numerical confidence also was considered as probability [2]. In Fig 5 is a confidence table. When we view the tuples of Drivers table, there is a number attached with each x-tuple indicated the confidence as in Fig 6; Frank has 1.0, Johnny/Jimmy has 1.0, and Billy has 0.9. By looking at Fig 7, there is a detailed probability for each tuple. The total confidences of x-tuple can be less or equal to 1.0 [2]. In this example, I inserted the data along with confidences into Drivers table. Optionally, TrioExplorer will compute the confidences by probability if no confidence available. (Fig 5) w/confidence tables (Fig 6) Drives table (Fig 7) Drives table 1.2.4 Lineage Lineage, which as defined in [2], is “recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” Specifically, lineage is a function which is used to derive alternatives for a set of alternatives. For an example, when the SQL query ‘select person from Drives’ was executed in TrioExplorer, the returned result is in Fig 8. The blue arrow in the picture presents the lineage. By clicking on it, we can see that Billy record was derived from an entire Billy record. By inserting this function in DBMS, the users can capture derivation of SQL query for better understanding of data relationship. 2 Trio Database System Management (Fig 8) Drives table (Fig 9) Drives table 1.3 TriQL: The Trio Query Language TriQL was developed specifically for querying Trio DBMS. It contains two major parts: built-in functions and predicates for querying confidence values and lineage, and regular SQL syntax [2]. As mentioned in [1], TriQL, which used to query over ULDBs, are translated automatically to SQL queries over the encoded tables. Here I will not discuss about TriQL query language and structure, but will be covered in Learning Trio Query Language (TriQL) structure section by using demo. 2.0 System Requirement of Installing Trio System To know the system requirement before the installation is a very important step. It may the installation of Trio system much easier and compatible. The requirement mainly concentrates on type of operation system, version of database server (PostgreSQL), and Python. In [3], the web page describes the operating systems in which was experimented successfully with Trio System, such as Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server). In PostgreSQL as database server which supported Trio system function has various versions available in www.postgresql.com, including 8.2.5, 8.1.10, 8.0.14, and 7.4.18 [4]. As mentioned in postgreSQL website, it only supports Linux and Win32. Optimal copy for Trio system is PostgreSQL 8.1.10. The last one is Python API. It can run on windows, Linux/Unix, Mac OS X, OS/2, and Amiga [5]. In the Trio system installation section, I used Windows XP professional as one of operating systems to install Trio system. 3.0 Source Codes/packages for Trio System In http://dbpubs.stanford.edu:8011/doku.php/trio:installation page listed 12 packages for completing the installation of Trio system. Each package has its own website which contained different versions of source codes corresponding with installation instructions. This section is to list all the resources for users to make their lives easier and install properly. (Please use listed versions of packages) 3.1 Listing source codes: 1. Python 2.4 can be downloaded from http://www.python.org/ . 2. Easy_install can be downloaded from http://peak.telecommunity.com/DevCenter/EasyInstall and the file called ez_setup.py. 3. Readline 1.7.win32 can be downloaded from http://www.python.org/ . 4. ctypes-1.0.2.win32-py2.4 can be download from http://www.python.org/ . 5. PostgreSQL 8.1 can be downloaded from http://www.postgresql.org/ . 6. Graphviz 2.14 is the only version compatible with Trio API. It is available in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe . 7. PyGreSQL can be downloaded from http://www.pygresql.org/ . 8. Pylons 0.9.5 can be downloaded from http://pylonshq.com/ . 9. PLY 2.2 can be downloaded from http://www.dabeaz.com/ply/ . 3 Trio Database System Management 10. PyParsing can be downloaded from http://pyparsing.wikispaces.com/ . 11. PyDot can be downloaded from http://code.google.com/p/pydot/downloads/list . 12. Trio API 1.0 can be downloaded from http://infolab.stanford.edu/~theobald/sources/TRIO.zip . 4.0 Procedure of Trio system installation Since Trio is deployed on top of the above packages, following order below in installing them one by one will be much easier. Some packages are depending among the others packages. For convenience, most of the required Python modules (Pylons, PyGreSQL, etc.) can be installed via ''Easy Install'' for Windows platforms. Most of the procedures below were mainly from [3], but modified as needed. 4.1. Python o o o Download Python 2.4 windows version (python-2.4.4.msi). Install Python in C:/Python directory Set path=c:/Python24; in environment variables 4.2. Readline - download Readline-1.7.win32-py2.4.exe and install into Python directory 4.3. Ctypes – download ctypes-1.0.2.win32-py2.4.exe and install into Python directory 4.4. PostgreSQL o Download PostgreSQL 8.1 windows version (postgresql-8.1.msi ) o Install PostgreSQL 8.1 as following: Language selection (Fig 10) – English Introduction screen (Fig 11) – next Welcome message and instructions (Fig 12) – next Feature selection (Fig 13) – next Service installation (Fig 14) – check install a service, input account name ‘postgres’ and password. Initdb (Fig 15) – check initialize database cluster, super username and password. Procedural languages (Fig 16) – Check PL/pgsql only. Contrib modules (Fig 17) – check Admin81 only. Next in (Fig 18, 19, 20, 21) to complete the installation. o Set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation. (Fig 10) [4] (Fig 11) [4] (Fig 12) [4] 4 Trio Database System Management (Fig 13) [4] (Fig 14) [4] (Fig 15) [4] (Fig 16) [4] (Fig 17) [4] (Fig 18) [4] (Fig 19) [4] (Fig 20) [4] (Fig 21) [4] 4.5. Graphviz - Download Graphviz 2.14 version and install to your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation 4.6. Easy_install - Download ez_setup.py in C:/ directory 4.7. PyGreSQL – In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to install components. 4.8. Pylons – In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables. 4.9. PLY – In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2. 4.10.PyParsing – In command line, cd\ to c: directory, and run python ez_setup.py PyParsing. 4.11.PyDot – download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’. 4.12.Trio API o Download source code in any directory o Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as triospi.dll 5 Trio Database System Management 5.0 Configuration of Trio System 5.1. Windows superuser authentication to access PostgreSQL o Double click in start->all programs->PostgreSQL 8.1->pgadmin III o Right click on Login Roles to create new login role (see in Fig 22) Role name ‘myname’ (as same as windows login account) Set password (Password can be any) Check all role privileges and click ok. o Right click on Database to create new database (see in Fig 23) Database name ‘myname’ (as same as username) Owner is ‘myname’ and click ok o Initialize Trio schema information (see in Fig 24) In Trio-1.0\setup, open setup.py with notepad to comment out the last three codes and put the following. (After complete the initialization, please change back to original). os.system("psql %s %s < setup.sql" % (pgdbname, username)) os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username)) os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username)) (Fig 22) Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python setup.py myname myname’ in command line. Provide password to create schema and done. (Fig 23) (Fig 24) 5.2. TrioExplorer [3] o Make sure PostgreSQL is working. o Running TrioExplorer – Ensure path ‘c:\python24\Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer. o At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances. o TrioExplorer should now be reachable from your browser using http://localhost:8080/. For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server. 6 Trio Database System Management 5.3. TrioPlus [3] – Trioplus is to use to prepare for a new user role and database instances be installed manually which should be different from the default PostgreSQL instance, and which requires logging in onto the PostgreSQL server with admin privileges. o Create new PostgreSQL user role and database instance Run ‘createuser demo’ Run ‘createdb demo’, the name must be the same as username o Initialize Trio schema information for new user by access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web. o Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’ 6.0 Experiment Trio DBMS Using TrioExplorer And TrioPlus 6.1 TrioExplorer: Fig 25 described the interface of TrioExploer in home tab after login in using login username and password. On the left side of the web page listed all symbols which used to show the types of tables/records. In the body of the page, a text-field area uses to execute TriQL. After press ‘execute’, the information will show up under with correct result. (Fig 25) Fig 26 is the schema tab in TrioExplorer. After created the tables of database, the diagram described the relations of tables, type of tables, and attributes. On the left side also listed the symbols and meanings of the diagram. (Fig 26) 7 Trio Database System Management Fig 27 is a sample tab. It uses to load the sample script which was provided by the application for testing purpose. After load the sample script data, the application will automatically execute the queries in DBMS. Also a copy of sample script will list on the body of the page. (Fig 27) Fig 28 is a script tab in TrioExplorer which uses to load a script from a file in your workstation to the application to be executed. After loaded successfully, you can click on the script to put a single line of script to the text-field to be executed in DBMS. (Fig 28) Fig 29 is a help tab which helps users for common problems such as how to use all tabs, how to create database, and more. (Fig 29) 8 Trio Database System Management 6.2 TrioPlus As seem in Fig 30, by accessing to Trio1.0/ directory at the command line, just type ‘python trioplus.py –u myname –d myname –p’ to connect to Postgres server. Please aware that myname account has to be created in Trio DBMS first. After established the connection with postgreSQL, you can execute any TriQL query statement in ‘>’ as same as in the text-field of TrioExplorer. (Fig 30) 7.0 Trio API Integrated Into Other Python Scripts Trio API contains many methods in the classes Triodb.py and xtyple.py under Trio-1.0 directory for integrating into other Python scripts. By importing both classes to Python scripts, it provides additional capability to handle tuple alternatives, computation of confidence, and lineage. As seem closely in the classes, the Trio API methods were designed to work very similar as Python DB-API interface [3]. There are several Trio API methods described detailed below was from [3]. 7.1 TrioCnx 7.1.1. TrioCnx(pgdb) – This constructor method creates a new Trio connection from a given PyGreSQL connection pgdb, the default Python DB-API. 7.1.2. cursor() – return a new TrioCursor object for the current connection. 7.1.3. commint() – commits the current transaction. 7.1.4. rollback() – performs a rollback for the current transaction. 7.1.5. close() – closes the Trio connection (and the underlying pgdb connection) 7.2 TrioCursor 7.2.1. execute(triql) - Executes a TriQL statement triql for this cursor object. 7.2.2. fetchone() - Fetches a single XTuple object from the current cursor position. 7.2.3. fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor position. 7.3 XTuple 7.3.1. len() - Returns the number of Alternative objects contained in this XTuple object. 7.3.2. getAlternative(idx) - Returns the Alternative object at the designated index idx. 7.3.3. getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its Alternative objects' confidence values. 7.3.4. getQuestionMark() - Returns whether this XTuple object has a question mark or not. 7.4 Alternative 7.4.1 getLineage() – returns a list of immediate lineage information as (source-table, source-aid) pairs of this alternative. 7.4.2 traceLineage() – performs a transitive lineage traversal for this alternative back to the base data 7.4.3 getConfidence() - Returns the confidence value (if any) of this alternative. 9 Trio Database System Management 7.4.4 computeConfidence() - Computes the confidence value of this alternative based on the traceLineage() function. 8.0 Trio API And Translator (Python) Trio API and translator (Python) is one of important components in Trio system. Every Trio query language (TriQL), entered from either TrioExplorer or TrioPlus by users, will be translated into encoded data table, Lineage table, Trio metadata, and/or Trio Stored Procedures of ULDB, and be executed in relational database. Node.py in Trio-1.0 directory and lexer.py in Trio-1.0\trioparser directory contain classes, methods, TriQL keywords initiation for supporting and validating TriQL queries through Trio API. TriQL built-in functions TriQL keywords Aggregate(), Alternative(), As(), Avg(), BinaryExpression(), Brackets(), Cascade(), ColumnList(), Column(), Command(), ComputeConfidences(), Conf(), ConfUniform(), ConfScaled(), Count(), CreateIndex(), CreateTableAs(), CreateTable(), CreateTempTableAs(), CreateTempTable(), DataType(), Dot(), DropIndex(), DropTableList(), DropTable(), Eavg(), Ecount(), EcountStar(), Emax(), Emin(), Empty(), Esum(), FromClause(), GroupAlts(), GroupbyClause(), GroupByKey(), HavingClause(), HorizontalSelect(), IdentifierList(), InsertList(), Insert(), InsertSpec(), Lineage(), Literal(), Lsum(), Max(), Maybe(), Maybe(), Min(), OrderBy(), Question(), SelectClause(), Select(), SelectOptions(), SetOperator(), Star(), Sum(), UnaryExpression(), UncertainSet(), UniformTable(), ViewTable(), WhereClause(), WithConfidences() 'all', 'and', 'any', 'as', 'avg', 'by', 'cascade', ‘compute', 'conf', 'confidences', 'count', 'create', 'distinct', 'drop', 'eavg', 'ecount', 'emax', 'emin', 'esum', 'except', 'exists', ‘flatten', 'float', ‘float4', 'float8', 'from', 'group', 'groupalts', 'having', 'in', 'index', 'insert', 'int', 'int4', 'int8', 'intersect', 'into', 'is', 'like', 'lineage', #'lavg', #'lcount', #'lmax', #'lmin', #'lsum', 'max', 'maybe', 'min', 'noconf', 'nolineage', 'nomaybe', 'not', 'null', 'on', 'or', 'order', 'scaled', 'scaledbyexp', 'select', 'sum', 'table', 'temporary', 'trio', 'trio_aid', 'trio_xid', 'uncertain', 'uniform', 'union', 'merged', 'values', 'varchar', 'view', 'where', 'with' 9.0 Trio Query Language (TriQL) Structure As mentioned in [3], “Trio is based on the ULDB model, an extension to the relational model that adds both uncertainty (U) and lineage (L) as first-class concepts.” Therefore, Trio Query Lanaguage (TriQL) is designed for querying and updating ULDB database using Encoded data table, Lineage table, Trio metadata, and Trio Stored Procedures. Also discussed earily, ULDB is extended over the standard SQL. So TriQL contains not only built-in functions to handle uncertainty, query/compute confidence values, derive data lineage, but also regular SQL queries. 9.1 Current supported for TriQL Supported DDL and Insert Commands Create Table Create Trio Table Supported Subset of TriQL Query Language Select-Project-Join queries Create Table T as <query> 10 Trio Database System Management Create Index Drop Table Including “Cascade” option to drop derived tables Drop Index Insert Into T Values <data> o With or without alternatives, confidence values, and ? o T must be a base table Insert Into T <TriQL query> o T must be a base table Select Distinct Union, Intersect, or Except of two subqueries (duplicate-eliminating) Union All of two subqueries Merged, Flatten, and GroupAlts Horizontal subqueries in Where clause o Including shortcuts and aggregation/group-by/having Horizontal subqueries in Select clause o Including shortcuts and aggregation but not group-by/having Conf() function with any number of tables or Conf(*) o except for Conf(*) in the SELECT clause in conjunction with DISTINCT or MERGED Maybe() predicate “Uniform <table-name>” in From clause “as conf” for query-defined result confidence values o Including “Uniform as conf” and “Scaled as conf” “Compute Confidences” at end of query Lineage() predicate o Not in horizontal subqueries o “=⇒” abbreviation allowed NoLineage, NoConf, and NoMaybe Based on the sample of TriQL in http://infolab.stanford.edu/~widom/triql.html, I created the TriQL tables/indexes and inserted the uncertainty data and confidences for demo. 9.2 Drop index/table – TriQL can use ‘drop index/table’ statemetn to delete the table as same as the standard SQL statement over the DBMS. drop index DRIVES_INDEX; drop table Drives; drop index SAW_INDEX; drop table Saw; 9.3 Create Trio table/index – To create TriQL table for uncertainty, the word ‘Trio’ must involve in the create statement ‘create trio…’ as below. Also a ‘uncertain()’ funciton in the same statement helps to define which field inside the table will be considerred as uncertainty. If the tail of the statement has ‘with confidences’, it means the table can be insertted with confidences and compute confidences during the query execution. ‘[…|…]’ means x-tuple. ‘?’ as mentioned above is alternative. ‘:number’ is the confidence. ---------------------------------------------------------------------- Creating tables and indexes, inserting data --------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car)); create index SAW_INDEX on Saw(color, car); 11 Trio Database System Management insert into Saw values [('Amy','blue','Honda') | ('Amy','red','Toyota')] ?; insert into Saw values [('Betty','green','Mazda') | ('Betty','green','Toyota') | ('Betty','green',NULL)]; insert into Saw values ('Cathy','red','Acura') ?; insert into Saw values [('Diane','red','Toyota') | ('Diane','blue','Toyota')]; create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car)); create index DRIVES_INDEX on Drives(person,color, car); insert into Drives values [('Frank','red','Toyota') | ('Frank','blue','Toyota')]; insert into Drives values ('Billy','blue','Honda') ?; insert into Drives values [('Jimmy','green','Mazda') | ('Johnny','green','Mazda')]; --------------------------------------------------------------------------------------------------------------- With confidence value --------------------------------------------------------------------------------------------------------------create trio table Saw(witness varchar(32), color varchar(32), car varchar(32), uncertain(color, car)) with confidences; create index SAW_INDEX on Saw(color, car); insert into Saw values [('Amy','blue','Honda'):0.4 | ('Amy','red','Toyota'):0.3]; insert into Saw values [('Betty','green','Mazda'):0.5 | ('Betty','green','Toyota'):0.2 | ('Betty','green',NULL):0.3]; insert into Saw values [('Cathy','red','Acura'):0.6]; insert into Saw values [('Diane','red','Toyota'):0.2 | ('Diane','blue','Toyota'):0.8]; create trio table Drives(person varchar(32), color varchar(32), car varchar(32), uncertain(person, color, car)) with confidences; create index DRIVES_INDEX on Drives(person,color, car); insert into Drives values [('Frank','red','Toyota'):0.7 | ('Frank','blue','Toyota'):0.3]; insert into Drives values [('Billy','blue','Honda'):0.9]; insert into Drives values [('Jimmy','green','Mazda'):0.4 | ('Johnny','green','Mazda'):0.6]; 9.4 TriQL language 9.4.1 uncertainty To create Saw (witness, color, car) and Drives (person, color, car) in the DBMS without confidences, basically copy the first part of above script into the text-field area under home tab and press ‘execute’. The application will execute and create two tables, Drives and Saw as seem in Fig 31. All the data was successfully inserted into the table. The green color of the table means uncertainty. (Fig 31) 12 Trio Database System Management Selection - In Fig 32 shows the results which selectted all reconrds from Saw table if its car is toyato while executing the TriQL ‘select * from Saw where car = ‘Toyato’. In the table, the blue arrow on the left side presents the lineage, and right side with ? presents the alternative. The 3D in the last recod contains two tuples (x-tuple). (Fig 32) Projection – Fig 33 shows the results of executing the query ‘Select color From Saw Where car = 'Mazda' or car = 'Toyota'. this query will find the colors of sighted Mazdas and Toyotas. (Fig 33) Projection (Merged) – Fig 34 shows the results of executing the query ‘Select Merged color From Saw Where car = 'Mazda' or car = 'Toyota'. This query will find the colors of sighted Mazdas and Toyotas. Merged in query statement will eliminate the duplicated tuples in horizontal. (Fig 34) Join – Fig 35 shows the results of executing the query ‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car’. the query joins Saw and Drives table based on the same color. Using merged fuction in query statement as ‘Select Merged D.person as suspect, D.color From Drives D, Saw S Where D.color = S.color’ will help to eliminate the duplicated tuples. (Fig 35) 13 Trio Database System Management Duplicate-Elimination (Distinct) – Fig 36 shows the results of executing query ‘Select Distinct color, car from Saw’. The query scan the entire Saw and return color and car for each tuple. By using distince funciton will eleminiate any vertical duplicated tuples. (Fig 36) Flatten – Fig 37 shows the results of executing query ‘Select Flatten * From Saw’. Flatten is used to turn tuples with alternative values into regular tuples. The query converts table Saw into a list of sightings [3]. (Fig 37) GroupAlts - Fig 38 shows the reuslts of executing the TriQL ‘Select GroupAlts(color) * From Saw’. The query takes a list of alternatives, and reorganizes the Saw table so the data is ‘keyed’ on color and uncertainty about car and witness [3]. The GroupAlts funciton is used to create or restructure alternative values. (Fig 38) Horizontal Subqueries: The [ ] in the Where Clause – Fig 39 Shows the results of executing the query ‘Select * From Saw Where 2 <= [Select Count(*) From Saw Where car = 'Toyota']’. The ‘[ ]’ inside the statement was treated as a horizontal subqueries. The query finds all tuples with at least two alternatives involving Toyotas. The [ ] in the select clause as ‘Select Merged witness, [Count(Distinct color)] as #colors From Saw’ (Fig 39) 14 Trio Database System Management Syntactic Shortcuts in [ ] for implicit table-list – Fig 40 shows the results of executing the query ‘Select * From Saw Where color <=All [color]’. Color is one the attributes in table Saw. The query is doing lexicographic color comparisons. The shortcut in [ ] also can use for ‘select * ‘ as ‘Select * From Saw Where Exists [car = 'Toyota']’. (Fig 40) 9.4.2 Compute Confidences To create Saw (witness, color, car) and Drives (person, color, car) in the DBMS with confidences, copy the second part of above script into text-field under home tab and press ‘execute’. The application will execute and create two tables, Drives and Saw with confidences as seem in Fig 41. The table color changed to orange, compute confidences. All the data was successfully inserted into the table. (Fig 41) Built-in Function Conf() – Fig 42 shows the results of executing the query ‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(D) > 0.3 And Conf(S) > 0.3’. The conf() takes a single table name or variable from the ‘From’ clause as a parameter, and returns the confidence of the current alternative being evaluated from that table. The query join Drives and Saw tables to generate supects if confidence > 0.3. (Fig 42) Conf() for Multi-table: ‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw S Where D.color = S.color and D.car = S.car And Conf(S) > 0.3 And Conf(*) > 0.2’ condition for result alternatives with confidence > 0.2. 15 Trio Database System Management Built-in function Maybe() – Fig 43 shows the results of executing query ‘Select S.witness, D.person as suspect, D.color, D.car From Drives D, Saw SWhere D.color = S.color and D.car = S.car And (Not Maybe(D)) And (Not Maybe(S))’. Maybe() takes a table name or variable from the ‘From’ clause as a parameter, and returns true if and only if the current alternative being evaluated from that table comes from a tuple with a ‘?’. (Fig 43) Uniform result confidence – Fig 44 shows the results of executing the query ‘Select *, uniform as conf From Saw’. The uniform used with ‘as conf’ assigns confidence values to a tuple as, if the tuple has n alternatives and no ‘?’, assign confidence 1/n to each alternative, or if the tuple has n alternatives and a ‘?’, assign confidence 1/(n+1) to each alternative. So the query computes and returns the confidences. (Fig 44) (Fig 45) Scaled result confidence – Fig 45 shows the results of executing the query ‘Select *, scaled as conf From Saw’. scaled used with ‘as conf’ assigns confidence values to a tuple, If the tuple has no confidence values and n alternatives, assign confidence 1/n to each alternative, or If the tuple has confidence values that sum to s, assign confidence value c/s to each alternative, where c is the existing confidence value for that alternative. So the query computes and returns the confidences. 9.5 Table of all possible TriQL contents – in [3] about the TriQL Query Language (website http://infolab.stanford.edu/~widom/triql.html#options ), it listed all built-in functions for querying ULDB. For convinient, I collected all and put them in a table below. ULDBs SQL over ULDBs Flatten and GroupAlts Horizontal subqeries: The [ ] construct Uncertian attibutes, maybe annotations and confidence values Selection, projection, join, subqueries, duplicateelimination, grouping and aggregation, aggregate variants, set operators, order by Flatten is used to turn tuples with alternative values into regular tuples, while GroupAlts is used to create or restructure alternative values [ ] in the where clause, [ ] with joins, Syntactics shortcuts in [ ], [ ] in the select clause, [ ] with SelfJoins 16 Trio Database System Management Builit-in Functions Conf() and Maybe() Result confidences Built-in Predicate Lineage() Options Nolineage, Noconf, and NoMaybe Data modificaiton Multi-table conf() Result confidence evaluation, uniform and scaled result confidences, On-Demand confidence computation The Lineage() predicate lets queries filter joined tuples based on whether they are related via lineage Indicate lineage, confidence values, and/or?'s should be omitted from query results Insert statement, delete statement, update statement 10.0 Advantage and Disadvantage of using Trio DBMS 10.1 Advantage 10.1.1 Open source and free support for any non-benefit users to experience new Trio DBMS Trio DBMS was created by Stanford University at Dec, 2006 and updated very often with new plug-in and Trio version. If uncounted any problems, the supporter in Stanford University, like Martin, Theobald, will provide solutions. Also there are many resources available for users to better understand about Trio DMBS in term of design/architecture, components, query structure, and more. 10.1.2 Advanced components in relational DBMS Current relational DBMSs, like Oracle, SQL, DB2, and more, can handle well-structure data easily, but not any uncertainty data and lineage. Hence, the Trio DBMS is more advanced. In Stanford Trio project, the ULDB was successfully built on top of postgreSQL. I believe later, it can be implemented on any relational DBMS as one of plug-in. 10.1.3 Computing confidences Computing Confidences is one of function in Trio System. It is very important for calculating the probability of each tuple in total. From the marketing perspective, the probability gives the better view of users to make decision, data analysis, and statistics. 10.1.4 Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent data [6] Persistent and convenient is because all data stored in simple tables (“relations”) and queries and updates via simple but powerful declarative language (SQL). In data transactions, it is safer, and also allows multi-user access. The DBMS can storage and indexing structures which makes it massive and efficient. 10.2 Disadvantage 10.2.1 Time cost for query In Trio DBMS query structure, the time costing to query uncertainty, lineage, and confidences, or regular SQL is more than regular relational DBMS because Trio needs to create super set of tuples which requires several extra steps. As below example, ULDB takes four steps and relational database takes only two steps. Therefore, relational database query is much faster than ULDB. Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB. 17 Trio Database System Management Over standard relational database: For each tuple in cross-product of X1, X2, ..., Xn 1. Evaluate the predicate 2. If true, project attr-list to create result tuple Over ULDB: For each tuple in cross-product of X1, X2, ..., Xn 1. Create “super tuple” T from all combinations of alternatives 2. Evaluate predicate on each alternative in T ; keep only the true ones 3. Project attr-list on each alternative to create result tuple 4. Details: ‘?’, lineage, confidences 10.2.2 Dependency Trio DBMS is not a completed DBMS but depending on any relational DBMS, like postgreSQL. The TrioExplorer was built in using python and other packages. Later some packages was updated to new version, for instance, graphviz 2.16, it is not compatible with TrioExplorer which will cause problems. Therefore, in order to keep Trio DBMS running, it requires efforts to maintain, reconfigure, and update. 10.2.3 On development stage - Trio project has not been developed completely with all functionalities, such as TrioExplorer for DB admonition, DB security, and more. 11.0 Documenting Trio System report/bugs During the experiment of Stanford Trio project, I went through the Trio instalation, TrioExplorer, TrioPlus and TriQL. There are several places with mistakes/not correct/imcomplete. 11.1.The install instruction in website http://dbpubs.stanford.edu:8011/doku.php/trio:installation , indicated unclearly the version of Graphviz for Trio system. In Graphviz website http://www.graphviz.org/ only has version 2.16 but not compatable except version 2.14. Graphviz version 2.14 is available for download in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe . 11.2.The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually. 11.3.After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’. 11.4.TriQL query statements in http://infolab.stanford.edu/~widom/triql.html#options, there are many samples queries not working properly as desired. 12.0 Written Components 12.1 Turn-in Documents Turn in Special interesting activity task plan summary, Activity deliverables, Standard, minimal documentation of hands-on work, and Electronic copy of activity. 18 Trio Database System Management 12.2 Demo and Presentation 12.2.1 Presentation of Trio database system architecture Brief description of Trio database system architecture, ULDB data model, Python API, TriQL query structure, features, installation step-by-step instruction, Python Sever Installation stepby-step instruction, and more 12.2.2 Demo Launch Python server, PostgreSQL, TrioExplorer, TrioPlus, TriQL (Create tables, indexes, insert, select, update, more query statement). 12.3 Reflective written component After spent some many hourses experimenting Trio project, I learned about python (server, coding, and executing programs), PostgreSQL (how different than other DBMS, installation, etc), Trio (installation, architecture, TriQL, and ULDB, TrioExplorer, TrioPlus), and more. during the experiment, I found out several things are very important to complete my goal such as support, resources, environment, and scale. The support means where to get help if encouted problems. Resources can provide a bais picture of the entire system, so I know what I am working on. Environment is where to experienment the project and what the operating system is, what the database server is, and more. The last one is the scale. To make sure to complete the project on time, I have to know how big the project is in order to manage my time. I felt that this is a greate experince to establish skill and ability in term of trouble shooting, problem solving, and adopting new technology. 13.0 References [1] M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January 2007. Demonstration description. [2] O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006. [3] Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on November, 18, 2007 from http://infolab.stanford.edu/trio/ . [4] PostgreSQL. Retrieved on November, 20, 2007 from http://www.postgresql.org/. [5] Python. Retrieved on November, 15, 2007 from http://www.python.org/. [6] Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07. Ppt. 19