Theseus Manual

advertisement
Theseus relational operators
Introduction
This document summarizes the proposed relational operators for Theseus V2. It contains a list of all
proposed operators and examples of their use. It will eventually be merged into a final Theseus V2
system document.
Definitions
tuple: set of typed attribute/value pairs
relation: set of tuples
ATTRIBUTE TYPES
char: Sequence of characters
number: Strings interpreted as numbers with floating point precision
date: Strings interpreted as dates in format DD-MMM-YYYY (e.g., 01-JAN-2002)
dom: Document Object Model tree object
relation: Embedded relation
General notes



Operator implementations should never copy data. The system automatically provides a copyon-write feature to ensure the safe parallel processing of common data.
Many operators (except PROJECT, etc) perform dependent joins on input relation
Literals are automatically interpreted as relations of k tuples + EOS, containing 1 attribute. For
example:
print ("Tup 1#Tup 2'" : )
would convert the second input as a stream of two tuples (with a single attribute "dummy"):
Tup 1
Tup 2
Operator Summary
Name
wrapper1
xwrapper1
xml2rel1
rel2xml
xquery1
select
project
Converts XML document into a relation
Converts a relation to an XML document
Queries relational attributes that are XML documents
Filters tuples
join
null
Tests if relation contains at least 1 tuple
minus
distinct
rename
tag1
pack2
unpack
apply1
aggregate1,2
Performs set union of two relations
Performs set minus of two relations
Emits tuples that are distinct with respect to a set of attributes
Renames one or more tuple attributes
Tags tuples with a unique ID
Accumulates a relation
Streams a relation
Applies a single-row function to a relation
Applies an multi-row function to a relation
format1
dbquery
Create a formatted string based on attributes
Fetches relation from DB based on query
dbappend
Appends relation to existing relation in DB (create if not found)
dbexport
Export relation to DB
dbupdate
Processes an update query (no results returned)
email
Emails data to valid e-mail address
fax
Faxes data to valid fax number
phone
Text message to valid cell phone
schedule
unschedule
2
Extracts web page data as packed relation
Extracts web page data as an XML document
Filters attributes
Combines relations horiz
union
1
Purpose
Schedules an agent task
Unschedules an agent task
= output is dependent join on incoming data
= output includes packed relation
xml2rel
PURPOSE:
Extract XML document into relation
INPUT:
old_rel
xml_attr
tag_attr
path
Incoming relation
Name of existing attribute containing XML document
Name of tag attribute (XML documents have order, so this is necessary)
XML path at which to begin extraction
OUTPUT:
new_rel
Incoming relation joined with XML document + tag attribute
EXAMPLE:
USER
greg
PASSWWORD
secret
RESULT
<XML>
<STOCKS>
<SYMBOL>ORCL
<PRICE>15.50</PRICE>
</SYMBOL>
<SYMBOL>CSCO
<PRICE>21.50</PRICE>
</SYMBOL>
</STOCKS>
</XML>
xml2rel(r3, "result", "index", "XML/STOCKS : r4)
r4 =
INDEX
1
USER
greg
PASSWORD
secret
2
greg
secret
RESULT
<XML>
<STOCKS>
<SYMBOL>ORCL
<PRICE>15.50</PRICE>
</SYMBOL>
<SYMBOL>CSCO
<PRICE>21.50</PRICE>
</SYMBOL>
</STOCKS>
</XML>
<XML>
<STOCKS>
<SYMBOL>ORCL
<PRICE>15.50</PRICE>
</SYMBOL>
<SYMBOL>CSCO
<PRICE>21.50</PRICE>
</SYMBOL>
</STOCKS>
</XML>
SYMBOL
ORCL
PRICE
15.50
CSCO
21.50
rel2xml
PURPOSE:
Convert relation into XML
INPUT:
in_rel
group_by
new_attr
Incoming relation
Attributes to group XML data by
Name of new attribute
OUTPUT:
out_rel
New relation with one attribute (XML document)
NOTES:


If group_by is NULL (i.e., ""), then each tuple is embedded within <ROW></ROW>
Only primitive types (CHAR, NUMBER, and DATE) are permitted for in_rel
EXAMPLE:
r6 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Greg
Oracle
Cisco
ORCL
CSCO
5000000
1000
rel2xml(r6, “portfolio, stock”, “newdoc” : r7)
r7 =
NEWDOC
<xml>
<portfolio>Greg
<stock>Oracle
<sym>ORCL</sym>
<vol>5000000</vol>
</stock>
<stock>Cisco
<sym>CSCO</sym>
<vol>1000</vol>
</stock>
</portfolio>
</xml>
rel2xml(r6, “”, “newdoc” : r7)
r7 =
NEWDOC
<xml>
<ROW>
<portfolio>Greg</portfolio>
<stock>Oracle</stock>
<sym>ORCL</sym>
<vol>5000000</vol>
</ROW>
<portfolio>Greg</portfolio>
<stock>Cisco</stock>
<sym>CSCO</sym>
<vol>1000</vol>
</portfolio>
</xml>
xquery
PURPOSE:
Create a new XML document based a collection of existing documents.
INPUT:
old_rel
xml
query
new_attr
Incoming relation
XML document
Legal XQuery
Name of new attribute containing document
OUTPUT:
new_rel
Incoming relation + new attribute(XML result of XQuery)
EXAMPLE:
xquery(r3, “myxml”, “//company/size > 4000”, “big_companies” : r4)
select
PURPOSE:
Perform relational select
INPUT:
in_rel
condition
Incoming relation
Filtering criteria
OUTPUT:
out_rel
Incoming relation modulo tuples failing the WHERE clause
EXAMPLE:
r4 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Greg
Oracle
Cisco
ORCL
CSCO
5000000
1000
select(r4, “volume > 10000” : r5)
r5 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Oracle
ORCL
5000000
project
PURPOSE:
Perform relational project
INPUT:
in_rel
attrs
Incoming relation
Attributes to project (comma delimited)
OUTPUT:
out_rel
Incoming relation modified to contain only the attributes projected
NOTES:

In addition to filtering columns, it also outputs them in the order specified.
EXAMPLE:
r5 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Greg
Oracle
Cisco
ORCL
CSCO
5000000
1000
project(r5, “vol, sym” : r6)
r6 =
VOL
SYM
5000000
1000
ORCL
CSCO
join
PURPOSE:
Perform relational join
INPUT:
lhs_rel
rhs_rel
condition
LHS relation
RHS relation
Join condition (LHS of expression is new attribute name)
OUTPUT:
out_rel
Joined relation
NOTES:

A condition of the string "nothing" will result in the Cartesian product
EXAMPLE:
r0 =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
r5 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Greg
Oracle
Cisco
ORCL
CSCO
5000000
1000
join(r0, r5, “l.customer = r.portfolio” : r6)
r6 =
CUSTOMER
PHONE
PORTFOLIO STOCK
SYM
VOL
Greg
Greg
310-555-1212
310-555-1212
Greg
Greg
ORCL
CSCO
5000000
1000
Oracle
Cisco
null
PURPOSE:
Tests if relation has at least one tuple
INPUT:
in_rel
true_rel
false_rel
Incoming relation
Relation to route when incoming relation is null
Relation to route when incoming relation is not null
OUTPUT:
ren_t_rel
ren_f_rel
Renamed TRUE relation
Renamed FALSE relation
EXAMPLE:
rtest =
NAME
Bill
r0 =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
r1 =
PORTFOLIO
STOCK
SYM
VOL
Greg
Greg
Oracle
Cisco
ORCL
CSCO
5000000
1000
null(rtest, r0, r1: rtrue, rfalse )
rtrue =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
Only 1 of these is produced
union
PURPOSE:
Perform set union
INPUT:
lhs_rel
rhs_rel
LHS relation
RHS relation (type compatible with LHS)
OUTPUT:
out_rel
Unioned relation
NOTES:


Union usually only demands type compatibility, not attribute name compatibility. Programmers
can PROJECT as necessary to massage one relation into another suitable for UNION-ing with a
third. In Theseus, attribute lists have order – thus, to UNION, both relations are combined
positionally.
The attribute name chosen for each column is always based on that of the first relation.
EXAMPLE:
r0 =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
r1 =
CUSTOMER
PHONE2
Cust1
Cust2
Greg
925-555-1212
650-555-1212
310-555-1212
union(r0, r1 : r2)
r2 =
CUSTOMER
PHONE
Jane
Cust1
Cust2
Greg
213-555-1212
925-555-1212
650-555-1212
310-555-1212
minus
PURPOSE:
Perform set minus
INPUT:
lhs_rel
rhs_rel
LHS relation
RHS relation (type compatible with LHS)
OUTPUT:
out_rel
LHS relation modulo tuples in RHS relation
EXAMPLE:
r0 =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
r1 =
CUSTOMER
PHONE
Cust1
Cust2
Greg
925-555-1212
650-555-1212
310-555-1212
minus(r0, r1 : r2)
r2 =
CUSTOMER
PHONE
Jane
213-555-1212
distinct
PURPOSE:
Emits tuples unique along a set of attributes
INPUT:
in_rel
attribs
Incoming relation
Comma delimited list of attributes to ensure uniqueness over
OUTPUT:
out_rel
Incoming relation modulo duplicate tuples (duplicate in terms of attribs)
EXAMPLE:
r1 =
CUSTOMER
PHONE
FAX
Cust1
Cust2
Cust2
Cust2
Greg
925-555-1212
650-555-1212
650-555-1212
650-555-1211
310-555-1212
null
925-555-1212
925-555-1212
925-555-1212
310-555-1212
distinct(r1, "customer, phone" : r2)
r2=
CUSTOMER
PHONE
FAX
Cust1
Cust2
Cust2
Greg
925-555-1212
650-555-1212
650-555-1211
310-555-1212
null
925-555-1212
925-555-1212
310-555-1212
rename
PURPOSE:
Renames tuple attributes
INPUT:
in_rel
oldnew
Incoming relation
Comma-delimited list of translations
OUTPUT:
out_rel
Incoming relation with modified attribute list
EXAMPLE:
r1 =
LAST_NAME
FIRST_NAME
Doe
Doe
John
Jane
rename(r1, "last_name lname, first_name fname" : r2)
r2 =
LNAME
FNAME
Doe
Doe
John
Jane
tag
PURPOSE:
Tags tuples with a unique ID
INPUT:
in_rel
name
Incoming relation
Name of prepended attribute
OUTPUT:
out_rel
Incoming relation + new tag attribute
NOTES:


This is primarily useful in preparing a relation to be PROJECTed and then JOINed.
No two tuples will ever have the same ID. Ever.
EXAMPLE:
r1 =
CUSTOMER
PHONE
Greg
Jane
310-555-1212
213-555-1212
tag(r1, "mytag" : r2)
r6 =
MYTAG
CUSTOMER
PHONE
1
2
Greg
Jane
310-555-1212
213-555-1212
pack
PURPOSE:
Embeds a relation in an attribute of a single tuple
INPUT:
in_rel
attr_name
Incoming relation
Name of new/existing tuple attribute with packed/unpacked relation
OUTPUT:
out_rel
EXAMPLE:
r1 =
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
pack(r1, "store" : r2)
r2 =
STORE
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
New relation containing accumulated incoming relation
unpack
PURPOSE:
Streams an embedded relation
INPUT:
in_rel
attr_name
Incoming relation
Name of existing tuple attribute with packed relation
OUTPUT:
out_rel
Unpacked relation
EXAMPLE:
r1 =
STORE
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
unpack(r1, "store" : r2)
r2 =
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
apply
PURPOSE:
Apply single-row function to a set of columns in a relation
INPUT:
in_rel
func_call
new_attrs
Incoming relation
Function call, including args that may be relation attributes
Names of new attributes containing single-row value
OUTPUT:
out_rel
Incoming relation + new attribute with single-row value
NOTES:




Single quotes are used to denote constants
All function parameters are Java Strings
All functions return an ArrayList that contains one or more Integer, Double, or String Java types
Plan writer must specify the proper number of new attribute names (i.e., writer needs to know
how many values are returned by the function being called. For example, a SPLIT function
might return two values; in that case, the names of the returned attributes (2 of them) need to be
supplied by the plan writer).
EXAMPLE:
r1 =
PRODUCT
PRICE
Hammer
Nail
12.99
20.99
/* “greater” is a built-in function */
apply(r1, “my_greater(price, '19.99')”, “higher_price” : r2)
r2 =
PRODUCT
PRICE
HIGHER_PRICE
Hammer
Nail
12.99
20.99
19.99
20.99
--- file: Functions.java --...
public static ArrayList my_greater (String s1, String s2)
{
ArrayList lst = new ArrayList();
Double lhs = new Double(s1);
Double rhs = new Double(s2);
if (lhs.doubleValue() >= rhs.doubleValue())
lst.add(lhs);
else
lst.add(rhs);
return lst;
}
aggregate
PURPOSE:
Apply aggregate function to a single column in a relation, outputting a packed relation.
INPUT:
in_rel
func_call
rel_attr
new_attr
Incoming relation
Function call, including args that may be relation attributes
Name of attribute containing embedded relation
New attribute containing result value
OUTPUT:
out_rel
Relation with two attributes (1st = original relation, 2nd = result of function)
NOTES:



No constants are allowed
Specified attributes will be accumulated and then marshalled to function as an ArrayList
All ArrayList objects will obviously have to be casted by the function implementer
o Thus, the function signature will consist of a String/Double/Integer return value and
parameters that are all ArrayList.
EXAMPLE:
r1 =
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
aggregate(r1, “my_max(price)”, “my_rel”, “max_price” : r2)
r2 =
MY_REL
MAX_PRICE
3.59
PRODUCT
PRICE
Hammer
Nail
3.59
0.99
--- file: Functions.java --public static Double my_max (ArrayList a_data)
{
boolean firstTime = true;
double max = 0;
for (int i=0; i<a_data.size(); i++) {
double tmp = Double.parseDouble(a_data.get(i).toString());
if (firstTime) {
firstTime = false;
max = tmp;
}
else if (max < tmp) {
max = tmp;
}
}
return new Double(max);
}
format
PURPOSE:
Format a string
INPUT:
in_rel
fmt
attrs
new_attr
Incoming relation
Format picture
List of attributes to use in formatting
New attribute containing formatted string
OUTPUT:
out_rel
Incoming relation + new attribute(formatted string)
EXAMPLE:
format(r1, “%s can be reached at %s”, “customer, phone”, “mesg” : r2)
dbquery
PURPOSE:
Retrieve tuples from external database using SQL query
INPUT:
query_expr
conn_string
username
password
Legal SQL SELECT (no INSERT, UPDATE, DELETE)
Connection string
User name
Password
OUTPUT:
out_rel
Relation containing query results
EXAMPLE:
dbquery("select name, phone from customers",
"jdbc:oracle:thin:@MyOracleDb", "scott", "tiger"
r1 =
NAME
Joe
PHONE
310-555-1212
: r1)
dbappend
PURPOSE:
Appends tuples to an existing external database relation (creates if does not exist)
INPUT:
in_rel
conn_string
username
password
relname
Incoming data
Connection string
User name
Password
Name of outgoing relation
OUTPUT:
(none)
NOTES:

Only database-type attributes (CHAR and NUMBER) are allowed for in_rel.
EXAMPLE:
dbappend(r1, "jdbc:oracle:thin:@MyOracleDb", "scott", "tiger" "employees": )
dbexport
PURPOSE:
Export tuples from relation to external database
INPUT:
in_rel
conn_string
username
password
relname
Incoming relation
Connection string
User name
Password
Name of outgoing relation
OUTPUT:
(none)
NOTES:

Only database-type attributes (CHAR and NUMBER) are allowed for in_rel.
EXAMPLE:
dbexport(r1, "jdbc:oracle:thin:@MyOracleDb", "scott","tiger" "employees": )
dbupdate
PURPOSE:
Executes an update query (no results returned).
INPUT:
query
conn_string
username
password
Legal SQL query
Connection string
User name
Password
OUTPUT:
(none)
EXAMPLE:
dbupdate("update employees set salary=1.1*salary",
"jdbc:oracle:thin:@MyOracleDb", "scott", "tiger" : )
email
PURPOSE:
Communicate relation via e-mail
INPUT:
in_rel
from
to
subject
mailhost
prologue
template
epilogue
Incoming data
Email address of sender
Email address of recipient
Subject line
SMTP mailhost to use
Text at the start of the message
Template for how each tuple should be formatted
Text at the end of the message
OUTPUT:
(none)
NOTES:

Delimiter for template is "$" at both the front and the back of the variable name; thus, binding a
variable to the template is $varname$
EXAMPLE:
email(r1,
"mr_bar@usc.edu",
"ms_foo@isi.edu",
"An important message from Mr. Bar",
"nitro.isi.edu",
"Ms. Foo,\n\nHere is the latest data you wanted:",
"NAME: $name$
ADDRESS: $street$, $city$, $state$",
"Best regards,\n Mr. Bar" : )
Ms. Foo,
Here is the latest data you wanted:
NAME: Sam
NAME: Jen
NAME: Ken
ADDRESS: 123 First St., Anytown, CA
ADDRESS: 123 Second St., Anytown, CA
ADDRESS: 123 Third St., Anytown, CA
Best regards,
Mr. Bar
fax
PURPOSE:
Communicate relation via fax
INPUT:
in_rel
to
prologue
template
epilogue
Incoming data
Phone number of recipient fax machine
Text at the start of the message
Template for how each tuple should be formatted
Text at the end of the message
OUTPUT:
(none)
EXAMPLE:
fax(r1,
"310-448-8471",
"Ms. Foo,\n\nHere is the latest data you wanted:",
"NAME: $name
ADDRESS: $street, $city, $state",
"Best regards,\n Mr. Bar" : )
Ms. Foo,
Here is the latest data you wanted:
NAME: Sam
NAME: Jen
NAME: Ken
ADDRESS: 123 First St., Anytown, CA
ADDRESS: 123 Second St., Anytown, CA
ADDRESS: 123 Third St., Anytown, CA
Best regards,
Mr. Bar
Download