Theseus relational operators Introduction This document summarizes the proposed relational operators for Theseus V2. It contains a list of all proposed operators and examples of their use. It will eventually be merged into a final Theseus V2 system document. Definitions tuple: set of typed attribute/value pairs relation: set of tuples ATTRIBUTE TYPES char: Sequence of characters number: Strings interpreted as numbers with floating point precision date: Strings interpreted as dates in format DD-MMM-YYYY (e.g., 01-JAN-2002) dom: Document Object Model tree object relation: Embedded relation General notes Operator implementations should never copy data. The system automatically provides a copyon-write feature to ensure the safe parallel processing of common data. Many operators (except PROJECT, etc) perform dependent joins on input relation Literals are automatically interpreted as relations of k tuples + EOS, containing 1 attribute. For example: print ("Tup 1#Tup 2'" : ) would convert the second input as a stream of two tuples (with a single attribute "dummy"): Tup 1 Tup 2 Operator Summary Name wrapper1 xwrapper1 xml2rel1 rel2xml xquery1 select project Converts XML document into a relation Converts a relation to an XML document Queries relational attributes that are XML documents Filters tuples join null Tests if relation contains at least 1 tuple minus distinct rename tag1 pack2 unpack apply1 aggregate1,2 Performs set union of two relations Performs set minus of two relations Emits tuples that are distinct with respect to a set of attributes Renames one or more tuple attributes Tags tuples with a unique ID Accumulates a relation Streams a relation Applies a single-row function to a relation Applies an multi-row function to a relation format1 dbquery Create a formatted string based on attributes Fetches relation from DB based on query dbappend Appends relation to existing relation in DB (create if not found) dbexport Export relation to DB dbupdate Processes an update query (no results returned) email Emails data to valid e-mail address fax Faxes data to valid fax number phone Text message to valid cell phone schedule unschedule 2 Extracts web page data as packed relation Extracts web page data as an XML document Filters attributes Combines relations horiz union 1 Purpose Schedules an agent task Unschedules an agent task = output is dependent join on incoming data = output includes packed relation xml2rel PURPOSE: Extract XML document into relation INPUT: old_rel xml_attr tag_attr path Incoming relation Name of existing attribute containing XML document Name of tag attribute (XML documents have order, so this is necessary) XML path at which to begin extraction OUTPUT: new_rel Incoming relation joined with XML document + tag attribute EXAMPLE: USER greg PASSWWORD secret RESULT <XML> <STOCKS> <SYMBOL>ORCL <PRICE>15.50</PRICE> </SYMBOL> <SYMBOL>CSCO <PRICE>21.50</PRICE> </SYMBOL> </STOCKS> </XML> xml2rel(r3, "result", "index", "XML/STOCKS : r4) r4 = INDEX 1 USER greg PASSWORD secret 2 greg secret RESULT <XML> <STOCKS> <SYMBOL>ORCL <PRICE>15.50</PRICE> </SYMBOL> <SYMBOL>CSCO <PRICE>21.50</PRICE> </SYMBOL> </STOCKS> </XML> <XML> <STOCKS> <SYMBOL>ORCL <PRICE>15.50</PRICE> </SYMBOL> <SYMBOL>CSCO <PRICE>21.50</PRICE> </SYMBOL> </STOCKS> </XML> SYMBOL ORCL PRICE 15.50 CSCO 21.50 rel2xml PURPOSE: Convert relation into XML INPUT: in_rel group_by new_attr Incoming relation Attributes to group XML data by Name of new attribute OUTPUT: out_rel New relation with one attribute (XML document) NOTES: If group_by is NULL (i.e., ""), then each tuple is embedded within <ROW></ROW> Only primitive types (CHAR, NUMBER, and DATE) are permitted for in_rel EXAMPLE: r6 = PORTFOLIO STOCK SYM VOL Greg Greg Oracle Cisco ORCL CSCO 5000000 1000 rel2xml(r6, “portfolio, stock”, “newdoc” : r7) r7 = NEWDOC <xml> <portfolio>Greg <stock>Oracle <sym>ORCL</sym> <vol>5000000</vol> </stock> <stock>Cisco <sym>CSCO</sym> <vol>1000</vol> </stock> </portfolio> </xml> rel2xml(r6, “”, “newdoc” : r7) r7 = NEWDOC <xml> <ROW> <portfolio>Greg</portfolio> <stock>Oracle</stock> <sym>ORCL</sym> <vol>5000000</vol> </ROW> <portfolio>Greg</portfolio> <stock>Cisco</stock> <sym>CSCO</sym> <vol>1000</vol> </portfolio> </xml> xquery PURPOSE: Create a new XML document based a collection of existing documents. INPUT: old_rel xml query new_attr Incoming relation XML document Legal XQuery Name of new attribute containing document OUTPUT: new_rel Incoming relation + new attribute(XML result of XQuery) EXAMPLE: xquery(r3, “myxml”, “//company/size > 4000”, “big_companies” : r4) select PURPOSE: Perform relational select INPUT: in_rel condition Incoming relation Filtering criteria OUTPUT: out_rel Incoming relation modulo tuples failing the WHERE clause EXAMPLE: r4 = PORTFOLIO STOCK SYM VOL Greg Greg Oracle Cisco ORCL CSCO 5000000 1000 select(r4, “volume > 10000” : r5) r5 = PORTFOLIO STOCK SYM VOL Greg Oracle ORCL 5000000 project PURPOSE: Perform relational project INPUT: in_rel attrs Incoming relation Attributes to project (comma delimited) OUTPUT: out_rel Incoming relation modified to contain only the attributes projected NOTES: In addition to filtering columns, it also outputs them in the order specified. EXAMPLE: r5 = PORTFOLIO STOCK SYM VOL Greg Greg Oracle Cisco ORCL CSCO 5000000 1000 project(r5, “vol, sym” : r6) r6 = VOL SYM 5000000 1000 ORCL CSCO join PURPOSE: Perform relational join INPUT: lhs_rel rhs_rel condition LHS relation RHS relation Join condition (LHS of expression is new attribute name) OUTPUT: out_rel Joined relation NOTES: A condition of the string "nothing" will result in the Cartesian product EXAMPLE: r0 = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 r5 = PORTFOLIO STOCK SYM VOL Greg Greg Oracle Cisco ORCL CSCO 5000000 1000 join(r0, r5, “l.customer = r.portfolio” : r6) r6 = CUSTOMER PHONE PORTFOLIO STOCK SYM VOL Greg Greg 310-555-1212 310-555-1212 Greg Greg ORCL CSCO 5000000 1000 Oracle Cisco null PURPOSE: Tests if relation has at least one tuple INPUT: in_rel true_rel false_rel Incoming relation Relation to route when incoming relation is null Relation to route when incoming relation is not null OUTPUT: ren_t_rel ren_f_rel Renamed TRUE relation Renamed FALSE relation EXAMPLE: rtest = NAME Bill r0 = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 r1 = PORTFOLIO STOCK SYM VOL Greg Greg Oracle Cisco ORCL CSCO 5000000 1000 null(rtest, r0, r1: rtrue, rfalse ) rtrue = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 Only 1 of these is produced union PURPOSE: Perform set union INPUT: lhs_rel rhs_rel LHS relation RHS relation (type compatible with LHS) OUTPUT: out_rel Unioned relation NOTES: Union usually only demands type compatibility, not attribute name compatibility. Programmers can PROJECT as necessary to massage one relation into another suitable for UNION-ing with a third. In Theseus, attribute lists have order – thus, to UNION, both relations are combined positionally. The attribute name chosen for each column is always based on that of the first relation. EXAMPLE: r0 = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 r1 = CUSTOMER PHONE2 Cust1 Cust2 Greg 925-555-1212 650-555-1212 310-555-1212 union(r0, r1 : r2) r2 = CUSTOMER PHONE Jane Cust1 Cust2 Greg 213-555-1212 925-555-1212 650-555-1212 310-555-1212 minus PURPOSE: Perform set minus INPUT: lhs_rel rhs_rel LHS relation RHS relation (type compatible with LHS) OUTPUT: out_rel LHS relation modulo tuples in RHS relation EXAMPLE: r0 = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 r1 = CUSTOMER PHONE Cust1 Cust2 Greg 925-555-1212 650-555-1212 310-555-1212 minus(r0, r1 : r2) r2 = CUSTOMER PHONE Jane 213-555-1212 distinct PURPOSE: Emits tuples unique along a set of attributes INPUT: in_rel attribs Incoming relation Comma delimited list of attributes to ensure uniqueness over OUTPUT: out_rel Incoming relation modulo duplicate tuples (duplicate in terms of attribs) EXAMPLE: r1 = CUSTOMER PHONE FAX Cust1 Cust2 Cust2 Cust2 Greg 925-555-1212 650-555-1212 650-555-1212 650-555-1211 310-555-1212 null 925-555-1212 925-555-1212 925-555-1212 310-555-1212 distinct(r1, "customer, phone" : r2) r2= CUSTOMER PHONE FAX Cust1 Cust2 Cust2 Greg 925-555-1212 650-555-1212 650-555-1211 310-555-1212 null 925-555-1212 925-555-1212 310-555-1212 rename PURPOSE: Renames tuple attributes INPUT: in_rel oldnew Incoming relation Comma-delimited list of translations OUTPUT: out_rel Incoming relation with modified attribute list EXAMPLE: r1 = LAST_NAME FIRST_NAME Doe Doe John Jane rename(r1, "last_name lname, first_name fname" : r2) r2 = LNAME FNAME Doe Doe John Jane tag PURPOSE: Tags tuples with a unique ID INPUT: in_rel name Incoming relation Name of prepended attribute OUTPUT: out_rel Incoming relation + new tag attribute NOTES: This is primarily useful in preparing a relation to be PROJECTed and then JOINed. No two tuples will ever have the same ID. Ever. EXAMPLE: r1 = CUSTOMER PHONE Greg Jane 310-555-1212 213-555-1212 tag(r1, "mytag" : r2) r6 = MYTAG CUSTOMER PHONE 1 2 Greg Jane 310-555-1212 213-555-1212 pack PURPOSE: Embeds a relation in an attribute of a single tuple INPUT: in_rel attr_name Incoming relation Name of new/existing tuple attribute with packed/unpacked relation OUTPUT: out_rel EXAMPLE: r1 = PRODUCT PRICE Hammer Nail 3.59 0.99 pack(r1, "store" : r2) r2 = STORE PRODUCT PRICE Hammer Nail 3.59 0.99 New relation containing accumulated incoming relation unpack PURPOSE: Streams an embedded relation INPUT: in_rel attr_name Incoming relation Name of existing tuple attribute with packed relation OUTPUT: out_rel Unpacked relation EXAMPLE: r1 = STORE PRODUCT PRICE Hammer Nail 3.59 0.99 unpack(r1, "store" : r2) r2 = PRODUCT PRICE Hammer Nail 3.59 0.99 apply PURPOSE: Apply single-row function to a set of columns in a relation INPUT: in_rel func_call new_attrs Incoming relation Function call, including args that may be relation attributes Names of new attributes containing single-row value OUTPUT: out_rel Incoming relation + new attribute with single-row value NOTES: Single quotes are used to denote constants All function parameters are Java Strings All functions return an ArrayList that contains one or more Integer, Double, or String Java types Plan writer must specify the proper number of new attribute names (i.e., writer needs to know how many values are returned by the function being called. For example, a SPLIT function might return two values; in that case, the names of the returned attributes (2 of them) need to be supplied by the plan writer). EXAMPLE: r1 = PRODUCT PRICE Hammer Nail 12.99 20.99 /* “greater” is a built-in function */ apply(r1, “my_greater(price, '19.99')”, “higher_price” : r2) r2 = PRODUCT PRICE HIGHER_PRICE Hammer Nail 12.99 20.99 19.99 20.99 --- file: Functions.java --... public static ArrayList my_greater (String s1, String s2) { ArrayList lst = new ArrayList(); Double lhs = new Double(s1); Double rhs = new Double(s2); if (lhs.doubleValue() >= rhs.doubleValue()) lst.add(lhs); else lst.add(rhs); return lst; } aggregate PURPOSE: Apply aggregate function to a single column in a relation, outputting a packed relation. INPUT: in_rel func_call rel_attr new_attr Incoming relation Function call, including args that may be relation attributes Name of attribute containing embedded relation New attribute containing result value OUTPUT: out_rel Relation with two attributes (1st = original relation, 2nd = result of function) NOTES: No constants are allowed Specified attributes will be accumulated and then marshalled to function as an ArrayList All ArrayList objects will obviously have to be casted by the function implementer o Thus, the function signature will consist of a String/Double/Integer return value and parameters that are all ArrayList. EXAMPLE: r1 = PRODUCT PRICE Hammer Nail 3.59 0.99 aggregate(r1, “my_max(price)”, “my_rel”, “max_price” : r2) r2 = MY_REL MAX_PRICE 3.59 PRODUCT PRICE Hammer Nail 3.59 0.99 --- file: Functions.java --public static Double my_max (ArrayList a_data) { boolean firstTime = true; double max = 0; for (int i=0; i<a_data.size(); i++) { double tmp = Double.parseDouble(a_data.get(i).toString()); if (firstTime) { firstTime = false; max = tmp; } else if (max < tmp) { max = tmp; } } return new Double(max); } format PURPOSE: Format a string INPUT: in_rel fmt attrs new_attr Incoming relation Format picture List of attributes to use in formatting New attribute containing formatted string OUTPUT: out_rel Incoming relation + new attribute(formatted string) EXAMPLE: format(r1, “%s can be reached at %s”, “customer, phone”, “mesg” : r2) dbquery PURPOSE: Retrieve tuples from external database using SQL query INPUT: query_expr conn_string username password Legal SQL SELECT (no INSERT, UPDATE, DELETE) Connection string User name Password OUTPUT: out_rel Relation containing query results EXAMPLE: dbquery("select name, phone from customers", "jdbc:oracle:thin:@MyOracleDb", "scott", "tiger" r1 = NAME Joe PHONE 310-555-1212 : r1) dbappend PURPOSE: Appends tuples to an existing external database relation (creates if does not exist) INPUT: in_rel conn_string username password relname Incoming data Connection string User name Password Name of outgoing relation OUTPUT: (none) NOTES: Only database-type attributes (CHAR and NUMBER) are allowed for in_rel. EXAMPLE: dbappend(r1, "jdbc:oracle:thin:@MyOracleDb", "scott", "tiger" "employees": ) dbexport PURPOSE: Export tuples from relation to external database INPUT: in_rel conn_string username password relname Incoming relation Connection string User name Password Name of outgoing relation OUTPUT: (none) NOTES: Only database-type attributes (CHAR and NUMBER) are allowed for in_rel. EXAMPLE: dbexport(r1, "jdbc:oracle:thin:@MyOracleDb", "scott","tiger" "employees": ) dbupdate PURPOSE: Executes an update query (no results returned). INPUT: query conn_string username password Legal SQL query Connection string User name Password OUTPUT: (none) EXAMPLE: dbupdate("update employees set salary=1.1*salary", "jdbc:oracle:thin:@MyOracleDb", "scott", "tiger" : ) email PURPOSE: Communicate relation via e-mail INPUT: in_rel from to subject mailhost prologue template epilogue Incoming data Email address of sender Email address of recipient Subject line SMTP mailhost to use Text at the start of the message Template for how each tuple should be formatted Text at the end of the message OUTPUT: (none) NOTES: Delimiter for template is "$" at both the front and the back of the variable name; thus, binding a variable to the template is $varname$ EXAMPLE: email(r1, "mr_bar@usc.edu", "ms_foo@isi.edu", "An important message from Mr. Bar", "nitro.isi.edu", "Ms. Foo,\n\nHere is the latest data you wanted:", "NAME: $name$ ADDRESS: $street$, $city$, $state$", "Best regards,\n Mr. Bar" : ) Ms. Foo, Here is the latest data you wanted: NAME: Sam NAME: Jen NAME: Ken ADDRESS: 123 First St., Anytown, CA ADDRESS: 123 Second St., Anytown, CA ADDRESS: 123 Third St., Anytown, CA Best regards, Mr. Bar fax PURPOSE: Communicate relation via fax INPUT: in_rel to prologue template epilogue Incoming data Phone number of recipient fax machine Text at the start of the message Template for how each tuple should be formatted Text at the end of the message OUTPUT: (none) EXAMPLE: fax(r1, "310-448-8471", "Ms. Foo,\n\nHere is the latest data you wanted:", "NAME: $name ADDRESS: $street, $city, $state", "Best regards,\n Mr. Bar" : ) Ms. Foo, Here is the latest data you wanted: NAME: Sam NAME: Jen NAME: Ken ADDRESS: 123 First St., Anytown, CA ADDRESS: 123 Second St., Anytown, CA ADDRESS: 123 Third St., Anytown, CA Best regards, Mr. Bar