CS157B Lecture Notes From January 25 to April 03 Hoang, Luong CS157B Section 04 Dr. Lin Weeks of January 25 – February 12: no lecture notes were assigned until the date of February 12. Week of feb 12 - feb 21, 2006 WEEKLY LECTURE NOTES : the following and the attachment is all the commands i did on SQL following all the statements from the sql_review.doc as Professor Lin told us to do for this week for the required LECTURE NOTES. ------------------------------------------------------------------------------------------------- SQL*Plus: Release 8.1.7.0.0 - Production on Sun Feb 19 13:05:01 2006 (c) Copyright 2000 Oracle Corporation. All rights reserved. Connected to: Oracle8i Enterprise Edition Release 8.1.7.0.0 - Production With the Partitioning option JServer Release 8.1.7.0.0 - Production SQL> SELECT * 2 FROM SUPPLIERS 3 ; S# SNAME STATUS CITY -- ---------- ---------- ---------S1 SMITH 20 LONDON S2 JONES 10 PARIS S3 BLAKE 30 PARIS S4 CLARK 20 LONDON S5 ADAMS 30 ATHENS SQL> SELECT * 2 FROM PARTs 3 ; P# PNAME COLOR WEIGHT CITY -- -------- ------ ---------- ---------P1 NUT RED 12 LONDON P2 BOLT GREEN 17 PARIS P3 SCREW BLUE 17 ROME P4 SCREW RED 14 LONDON P5 CAM BLUE 12 PARIS P6 COG RED 19 LONDON 6 rows selected. SQL> SELECT P# FROM SHIPMENTS; P# -P1 P2 P3 P4 P5 P6 P1 P2 P2 P2 P4 P# -P5 12 rows selected. SQL> SELECT DISTINCT P# FROM SHIPMENTS; P# -P1 P2 P3 P4 P5 P6 6 rows selected. SQL> SELECT COUNT(*) 2 FROM SUPPLIERS 3 ; COUNT(*) ---------5 SQL> SELECT COUNT(Status) 2 From Suppliers 3 ; COUNT(STATUS) ------------5 SQL> SELECT COUNT(Distinct Status) 2 FROM SUPPLIERS; COUNT(DISTINCTSTATUS) --------------------3 SQL> SELECT COUNT(*) FROM SHIPMENTS WHERE P# = 'P2'; COUNT(*) ---------4 SQL> SELECT P#, SUM(QTY) 2 FROM SHIPMENTS 3 GROUP BY P#; P# SUM(QTY) -- ---------P1 600 P2 1000 P3 400 P4 500 P5 P6 500 100 6 rows selected. SQL> Select employee.ssn, employee.dno 2 from employee, department 3 where employee.prikey=department.mgrprikey 4 ; SSN DNO ---------- ---------333445555 5 987654321 4 888665555 4 SQL> Select E.dno, E.prikey, E.salary 2 from employee E, department 3 where E.prikey=department.mgrprikey; DNO PRIKEY SALARY ---------- ------------ ---------52 40000 44 43000 48 25000 SQL> Select E.dno,count(E.salary) 2 from employee E 3 group by E.dno 4 having count(E.salary) >1; DNO COUNT(E.SALARY) ---------- --------------4 4 5 4 SQL> select E.dno, E.salary , Count(*) 2 from employee E 3 group by E.dno, E.salary 4 having count(*) >1 ; DNO SALARY COUNT(*) ---------- ---------- ---------4 25000 3 SQL> select E.salary, count(E.dno) 2 from employee E 3 group by E.salary 4 having count(E.dno) >1; SALARY COUNT(E.DNO) ---------- -----------25000 4 SQL> select count(E.dno), E.prikey 2 from employee E 3 group by E.prikey 4 having count(E.dno) >1; no rows selected SQL> Select employee.dno, 2 avg(employee.salary) as ave 3 4 5 6 from employee, department where employee.prikey=department.mgrprikey group by employee.dno having avg (employee.salary ) >10; DNO AVE ---------- ---------4 34000 5 40000 SQL> select E.dno, avg(E.salary) as ave 2 from employee E, department 3 where E.prikey=department.mgrprikey 4 group by E.dno 5 having avg(E.salary) >10; DNO AVE ---------- ---------4 34000 5 40000 SQL> Select F.fname, F.minit, F.lname, 2 F.salary as mgr_salary, 3 F.dno, G. dept_avg 4 from Department D, 5 Employee F, 6 (select E.dno as edno, 7 avg(E.salary) as dept_avg 8 from employee E 9 group by E.dno) G 10 where D.dnumber= G.edno 11 and F.prikey=D.mgrprikey 12 and F.salary > dept_avg; FNAME M LNAME MGR_SALARY DNO DEPT_AVG ---------- - ---------- ---------- ---------- ---------FRANKLIN T WONG 40000 5 33250 JENNIFER S WALLACE 43000 4 29500 SQL> SELECT S.*, P.* 2 FROM SUPPLIERS S, PARTS P 3 WHERE S.CITY='London' and P.CITY='Paris'; no rows selected SQL> SELECT SUPPLIERS.S#, SUPPLIERS.SNAME, SUPPLIERS.STATUS, SUPPLIERS.CITY, PARTS.P#, PARTS.PNAME, PARTS.COLOR, PARTS.WEIGHT 2 FROM SUPPLIERS, PARTS 3 WHERE SUPPLIERS.CITY = PARTS.CITY 4 ; S# SNAME STATUS CITY P# PNAME COLOR WEIGHT -- ---------- ---------- ---------- -- -------- ------ ---------S1 SMITH 20 LONDON P1 NUT RED 12 S4 CLARK 20 LONDON P1 NUT RED 12 S1 SMITH 20 LONDON P4 SCREW RED 14 S4 CLARK 20 LONDON P4 SCREW RED 14 S1 SMITH 20 LONDON P6 COG RED 19 S4 CLARK 20 LONDON P6 COG RED 19 S2 JONES 10 PARIS P2 BOLT GREEN 17 S3 BLAKE 30 PARIS P2 BOLT GREEN 17 S2 JONES 10 PARIS P5 CAM BLUE 12 S3 BLAKE 30 PARIS P5 CAM BLUE 12 10 rows selected. SQL> SELECT SUPPLIERS.*,PARTS.* 2 FROM SUPPLIERS, PARTS 3 WHERE SUPPLIERS.City > PARTS.City 4 ; S# SNAME STATUS CITY P# PNAME COLOR WEIGHT CITY -- ---------- ---------- ---------- -- -------- ------ ---------- ---------S2 JONES 10 PARIS P1 NUT RED 12 LONDON S3 BLAKE 30 PARIS P1 NUT RED 12 LONDON S2 JONES 10 PARIS P4 SCREW RED 14 LONDON S3 BLAKE 30 PARIS P4 SCREW RED 14 LONDON S2 JONES 10 PARIS P6 COG RED 19 LONDON S3 BLAKE 30 PARIS P6 COG RED 19 LONDON 6 rows selected. SQL> SELECT P#, MIN(WEIGHT) 2 FROM PARTS 3 WHERE COLOR = 'RED' 4 ; SELECT P#, MIN(WEIGHT) * ERROR at line 1: ORA-00937: not a single-group group function SQL> Select p.p# , p.pname, sum(sp.qty) as sum 2 from Shipments sp, Parts p 3 where p.p# = sp.p# 4 and p.weight > 0 5 group by sp.p#, p.p#, p.pname 6 having sum( sp.qty) >= 100 7 ; P# PNAME SUM -- -------- ---------P1 NUT 600 P2 BOLT 1000 P3 SCREW 400 P4 SCREW 500 P5 CAM 500 P6 COG 100 6 rows selected. SQL> Select p.p# , p.pname, h.sum 2 from parts p, (Select sp.p#, sum(sp.qty) as sum 3 from shipments sp 4 group by sp.p# 5 having sum(sp.qty) >= 100) h 6 where p.p# = h.p# 7 and p.weight > 0; P# PNAME SUM -- -------- ---------P1 NUT 600 P2 BOLT 1000 P3 SCREW 400 P4 SCREW 500 P5 CAM 500 P6 COG 100 6 rows selected. SQL> Select k.p# , k.pname, h.sum 2 from 3 (Select sp.p#, sum(sp.qty) as sum 4 from shipments sp 5 group by sp.p# 6 having sum(sp.qty) >= 100) h 7 , 8 (Select p.p# , p.pname 9 From Parts p 10 where p.weight > 0)k 11 where k.p# = h.p#; P# PNAME SUM -- -------- ---------P1 NUT 600 P2 BOLT 1000 P3 SCREW 400 P4 SCREW 500 P5 CAM 500 P6 COG 100 6 rows selected. SQL> SELECT PARTS.* 2 FROM PARTS 3 WHERE PARTS.PNAME LIKE 'C%'; P# PNAME COLOR WEIGHT CITY -- -------- ------ ---------- ---------P5 CAM BLUE 12 PARIS P6 COG RED 19 LONDON SQL> SELECT S# 2 FROM SUPPLIERS 3 WHERE STATUS > 25 4 ; S# -S3 S5 SQL> SELECT S# 2 FROM SUPPLIERS 3 WHERE STATUS is NULL; no rows selected SQL> SELECT S#, P# 2 FROM SHIPMENTs 3 WHERE QTY IS NOT NULL; S# P# -- -S1 P1 S1 P2 S1 P3 S1 P4 S1 P5 S1 P6 S2 P1 S2 P2 S3 P2 S4 P2 S4 P4 S# P# -- -S4 P5 12 rows selected. SQL> SELECT Sname 2 FROM SUPPLIERS 3 WHERE S# not in 4 FROM SHIPMENTS 5 WHERE P# = 'P2'); (SELECT S# SNAME ---------ADAMS SQL> SELECT S# 2 FROM SUPPLIERs 3 WHERE CITY = 4 (SELECT CITY 5 FROM SUPPLIERs 6 WHERE S#='S1'); S# -S1 S4 SQL> SELECT SNAME 2 FROM SUPPLIERS 3 WHERE NOT EXISTS 4 (SELECT * 5 FROM SHIPMENTS 6 WHERE S# = Suppliers.S# 7 AND P# = 'P2'); SNAME ---------ADAMS SQL> SELECT SNAME 2 FROM SUPPLIERS 3 WHERE NOT EXISTS 4 (SELECT * 5 FROM SHIPMENTS 6 WHERE S#=S# 7 AND P#='P2'); no rows selected SQL> SELECT P# 2 FROM PARTS 3 WHERE WEIGHT > 16 4 UNION 5 SELECT P# 6 FROM SHIPMENTS 7 WHERE S# = 'S2'; P# -P1 P2 P3 P6 SQL> UPDATE Suppliers 2 SET Status=Null 3 WHERE S# = 'S5'; 1 row updated. SQL> UPDATE SHIPMENTS 2 SET QTY = 0 3 WHERE 'London' = 4 (SELECT CITY 5 FROM SUPPLIERS 6 WHERE SHIPMENTS.S#= SUPPLIERS.S#); 0 rows updated. SQL> DELETE 2 FROM SHIPMENTs 3 WHERE P# = 'P3'; 1 row deleted. SQL> DELETE 2 FROM SHIPMENTs 3 WHERE 'London'= 4 (SELECT CITY 5 FROM SUPPLIERs 6 WHERE S.S# = 7 SP.S#); SP.S#) * ERROR at line 7: ORA-00904: invalid column name SQL> INSERT 2 INTO SUPPLIERs(S#, SNAME, STATUS) 3 VALUES ('S6', 'James', 35); 1 row created. SQL> INSERT 2 INTO SHIPMENTs(S#,P#,QTY) 3 VALUES ('S2', 'P2', 1000); 1 row created. SQL> CREATE TABLE TEMP 2 (SNUM CHAR(5), 3 PNUM CHAR(5), 4 QTY NUMBER(38)); Table created. SQL> INSERT 2 INTO TEMP (SNUM, PNUM, QTY) 3 SELECT * 4 FROM SHIPMENTS 5 Where S#='S2'; 3 rows created. SQL> exit Week of feb 20 - feb 27, 2006 February 20, 2006 The following code creates a temporary table which holds the serial number, product number and the quantity of it left: create table temp ( snum char(5), pnum char(5), qty number(38)); The following formate of code which inserts a tuple into a the temporary table: insert into temp (snum,pnum,qty) The following code selects all the snum where is s2: select * from shipments where snum ='s2'; QUERY OPTIMIZATION------------------------------------------------------------------------------------------------The following code gets the product name in which is the natural join of the parts and shipments table in which the weight is greater than 1000: SELECT G.PNAME FROM (SELECT p.P#,p.PNAME,p.COLOR,p.WEIGHT,p.CITY,sp.S#,sp.qty FROM PARTS P, SHIPMENTS SP) g WHERE g.WEIGHT > 1000 This is the code we went over to in which we go from step to step to convert a sql statement to a relational algebra statement Select p.pname , sum(sp.qty) as sum from parts p, shipments sp where p.pnum = sp.pnum and p.weight > 1000 group by p.pname, p.pnum having sum(sp.qty ) >= 1 1.) Get the natural join of 2 tables : (P X S) 2.) Make a new table which consist of only product number and products which weight is over 1000: σP.PNUM = SP.PNUM, P.WEIGHT >1000(P X S) 3.) Do a grouping of product name product number, and quantity to a sum: σSUM >= 1000 γP.PNAME, P. PNUM, SUM( SP.QTY) → SUM (σP.PNUM = SP.PNUM and P.WEIGHT >0(P X S)) 4.) Now finally the projection of product name and sum to match the original sql statement: P.PNUM, SUM (σSUM >= 1000 (γP.PNUM, SUM(SP.QTY) → SUM (σP.PNUM = SP.PNUM and P.WEIGHT >1000(P X S)))) (The above code is found from the queryoptimization.doc on Dr. Lin's website) According to the Advert-IT database developmental group, they have outlined 17 steps into optimizing tables. 1. 2. 3. 4. 5. 6. 7. Show the minimum number of fields within the query. Index all the join fields, fields in expressions, sorted fields, and the restricted based fields. Use as much as possible the primary keys and unique keys from the table. Numeric keys are ideal instead of text keys. Do not use blank unique fields. Do not use if and only if functions in the query statements. Do not use domain aggregations. 8. Use between and equal instead of <, > which speeds up the queries. 9. Fixed column headings in queries. 10. For reports based on queries use Portrait view in preference to Landscape and select Fast Laser Printing to Yes (View,Options,Other Properties). 11. Make sure to use make table queries for running on static data. 12. Do not use count(columnNme), instead use count(*). 13. For joined columns in 1 to x relationships, test the comparative performance by placing the restriction on the 1 or the x sides 14. Use shorter names instead of longer ones. 15. Normalize all the tables 16. Denormalize the tables 17. Avoid the use of Distinct Row. The end of class consisted of Fahad presenting how to set up oracle on your personal computer which consisted of creating new users, new database, how to connect to a database via oracle and command line, and concluded with small overview. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ February 22, 2006 (Notes from optimization_law.doc) Product and join relations are communitive and associative (The following are from the optimization_law.doc): RS=SR R S = S R ( is natural join) RS=SR RS=SR R (S T) = (R S) T R (S T) = (R S) T R (S T) = (R S) T R (S T) = (R S) T The most efficient way to lower the number of rows is the simple push down selection since joins take much time. There is an example on the document with over 10 thousand rows in which the time taken is obviously noticed. Basic selection properties: 1. 2. σA(R) = σAσA(R) σAσB(R) = σBσA(R) The following three rules are the selection and set operators, they will push the operations in an expression tree. 1. 2. 3. We then in class go over each individual database to check whether progress is being made or not. For the next assignment we need to: For our project we need to find the term frequency for each term. We need to find the TFIDF in which we need to find the terms in all the documents. The LSI is also needed to be found. Finally, create a database with token, filename, and the TFIDF values ALL OF THIS IS DUE March 8, 2005 MIDTERM NEXT WEEK March 3, 2005 Week of feb 27 – march 5, 2006 LECTURE NOTES (REQUIRED ALONG WITH THE PROGRESS REPORT): February 27, 2006 SUBJECT BITMAP INDEXES: - The definitions are the collection of bit-vector of the total number of records. - If there is a value at an index then the vector bit is listed as '1', but if not then is '0'. - Distinct values are the values which are distinct within the column. - The BitMap Vector is the number of tuples within the database. F G 30 foo 30 bar 40 baz 50 foo 40 bar 30 baz Granular Form: value Granular values Bit-vector 30 e1,e2,e6 110001 40 e3,e5 001010 50 e5 000100 - BitMap Index Forms is a table with two columns which contains the distinct values and the BitVector which in the example is a 6 so we see 'xxxxxx' on all the tuples. - But when looking at a Bit-Value of a certain value we see the columns of all the tupes and listed '1' for where that value exit but '0' otherwise. - ex) 30 | 110001 - Granular Values is an 'e' + the row which the value is for. - Then there is the sorted of the value of the table. - If the table has 'n' tuples then the total number of bits needed is the bitmap index is n^2. - The advantage of this process is the faster search process.. for example SELECT title FROM Movie WHERE studioName = 'Disney' AND year = 1995; We look at the bitmap indexes for studioName and year so we see the 100001(studioName) and 100100(year) and then we do an intersection and we get 100000 so we know that only tuple 1 has only the value 'Disney' and 1995. - This bitmap index can also be applied to other sql statements such as... SELECT * FROM R WHERE 23 <= age <= 25 and 60 <= salary <= 70 ex) Of the Bitmap Indecies Age salary 25 60 22 55 30 70 22 55 23 55 25 100 23 45 30 45 AGE: 22 01010000 23 00001010 25 10000100 30 00100001 SALARY: 45 00000011 60 10000000 55 01011000 70 00100000 100 00000100 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ March 1, 2006 - MIDTERM EXAM on tables, sql statments, term frequency, query algebra, and rules. Week of march 6– march 12, 2006 LECTURE NOTES (REQUIRED ALONG WITH THE PROGRESS REPORT): March 6, 2006 Given the following Database from relational_database_example.doc of DB-2 example do the following: EMPLOYE PRIKE FNAM MINIT LNAM SSN E Y E E char( char(10) char(1 char(1 1) char(1 2) 0) 0) 1234567 1 John B Smith 89 Frankli 3334455 2n T Wong 55 9998877 3 Alicia J Zelaya 77 Jennife Wallac 9876543 4r S e 21 Rames Naraya 6668844 5h K n 44 4534534 6 Joyce A English 53 9879879 7 Ahmad V Jabbar 87 8886655 8 James E Borg 55 BDATE char(10) ADDRESS char(30) 1965-0109 1955-1208 1968-0719 1941-0620 1962-0915 1972-0731 1969-0329 1937-1110 731 Fondren, Houston, TX SEX SALAR SUPERPRIK DN Y EY O char( 1) decimal char(12) shor t M 30000 2 5 638 Voss, Houston, TX 3321 Castle, Spring, TX M 40000 8 5 F 25000 4 4 291 Berry, Bellaire, TX 975 Fire Oak, Humble, TX 5631 Rice, Houston, TX 980 Dallas, Houston, TX 450 Stone, Houston, TX F 43000 8 4 M 38000 2 5 F 25000 2 5 M 25000 4 4 M 55000 null 1 1.) create 100 tuples of this database repeating each row so that there will be several duplicates. The only differencce of each tuple will be the primary key which will be listed from 1 to 100 2.) write java code that will generate all possible association of the columns in sql code. 3.) the sql code generated from the 2000 possible statements will be printed out onto a separate file for viewing. Send all the 3 following to dr.lin, due by wednesday for 120 points, or due by monday for 100 points ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ March 8, 2006 The LSI table with the should have been due today, the combination of all 20k files will be due by next week The following code will produce the lsi value for each token: The LSI table, the following code will figure the lsi for each distinct token in all the documents: insert into lsi (id,token,tfidf) select id,token, occurences*log(10,(total/concerned)) TFIDF from (select count(distinct id) total from data), (select distinct id, token,count(position) occurrences from data group by id,token) pairs natural join (select count(distinct id) concerned, token from data group by token); This code finds the LSI values for the tokens: this code should be repeated for all tokens within the supplied 20,000 files given due by next week wednesday (this is dr. lins code) THE SQL STATEMENTS I USE TO COMPUTE THE VALUES: 1.) tf(ti;dj) = number of occurrences of i ('the') in j ('58142') select count(id) from database where token = ‘ ’ and id = ' '; 2.) Tr = total number of documents select count(distinct id) from database; 3.) Tr(ti) = number of documents containing i ('the') select count(distinct id) from database where token = ; 4.) Then use the java Math class to figure the logarithm. Here is the piece of code used… tf = "select count(" + idHold + ")from database where token = \'" + tokHold + "\' and id = \'" + idHold + "\';"; tr = "select count(distinct id) from database;"; trti = "select count(distinct id) from database where token = \'" + tokHold + "\';"; tfSet = s1.executeQuery(tf); trSet = s2.executeQuery(tr); trtiSet = s3.executeQuery(trti); tfSet.next(); trSet.next(); trtiSet.next(); value = (tfSet.getDouble(1)) * Math.log10(trSet.getDouble(1) / trtiSet.getDouble(1)); s.execute("insert into tfidf values(\'" + tokHold + "\' , \'" + idHold + "\' , \'" + value + "\');"); // The last line inserts the lsi value into the new table which holds all the tfidf values. It is not possible to generate the lsi table example dr. lin gave last week due to oracles limitations of 1000 maximun columns. So it will be new formatted in this: 3 columns: token,id,lsi,value the table should contain 20,000 files of tokens. The new tokenizer should reconize numbers with words such as: cs157b instead of treating numbers and words differnetly Week of march 13 – march 19, 2006 LECTURE NOTES (REQUIRED ALONG WITH THE PROGRESS REPORT): march 13,2005 (on appiori_sql.doc) Table of food with 8 tuples. New table is shown with the columns for each different type of food, and everytim it exist there is a 1 and 0 otherwise. But this method is not efficient due to the amount of space needed to have all the tuples of possible items drop table supermarket create table supermarket( transaction varchar(5), item1 varchar(5), item2 varchar(5), item3 varchar(5), ... item7 varchar(5)); Association Rules: how to find important items? select from item1 from supermarket group by item1 having count(*)>1; bad way: select item1 from supermarket group by item1 having count(*)>1; then repeat all for all columns the previous sql statement: 8 times select item1,item2 from supermarket group by item1,item2 having count(*)>1; ... and so on for all combinations To know whether of not an item is important is by looking at the first pair of columns.... if the individual item1 and item2 are important then figure the association of item1&item2. If the subset if significant then compute the superset. function dependencies in the shown table... if the person bought parsley then they bought cucumber & if the person bought onion then they bought tomato. How to find functional dependencies (according to dr. lin): Select item1,count(distinct item2) from supermarket group by item1,item2 having count(distinct item2)>1; for every x there is some y where x->y The code I used to figure functional dependency: Select item1,item2 from employee group by item1,item2 having count(distinct item2)>1; Alternative code but more complicated: SELECT A, B, MIN(B),MAX(B) FROM Table GROUP BY A, B HAVING MIN(B)<> MAX(B); => no row return, then do the next step SELECT A, MIN(A),MAX(B) FROM Table GROUP BY B HAVING MIN(B)<> MAX(B) => no row return, means B is determined by A another example: select sname,count(distinct status) from suppliers group by sname; homework find the functional dependencies for the EMPLOYEE table decision rules? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ march 15, 2006 project stuff: the new TF of pairs = (# of occurences) / (some sort of normalization, computation inexpensive) The normalization is whatever we can think of. This is the association of 2 tokens now. The normalization should be able to be has inexpensive regarding the amount of computation done by the computer and relective of the length of the articles. A DataWarehouseis a storage table for easier data analysis. The datawhere house is similar to the regular database, but the columns for the database are the rows upon the datawarehouse. This type of storage is ideal for data analysis, but it cannot be used modification. To analysize new data, the datawarehouse must be repopulated with some sort of crawler. Group member found a sql statement to find functional dependencies: SELECT A, B, MIN(B),MAX(B) FROM Table GROUP BY A, B HAVING MIN(B)<> MAX(B); // no row return, then do the next step SELECT A, MIN(A),MAX(B) FROM Table GROUP BY B HAVING MIN(B)<> MAX(B) // no row return, means B is determined by A Week of march 20 – march 27, 2006 march 20, 2006 association rules.. where the column appears at least >= threshold, if it doesnt repeat more than the threshold than we we dont care. But this is for length l. If length is 2 or greater than we find the combo of the columns with length 2, then with the threshold we find if it appears more or equal to the threshold. (7 choose 1) + (7 choose 2) + ... + (7 choose 7) = 2^7 Question on quiz: list the assocation og length 2 with threshold=3? 2.) Which triples meet the arpiori conditions? TABLE FROM THE QUIZ t1: cucumber,parsley,onion,tomato,salt,bread t2: tomato,cucumber,parsly t3: tomato,cucumber,onion,parsley t4: tomato,cucumber,onion,bread t5: tomato,salt,onion t6: bread,cheese t7: tomato,cheese,cucumber t8: bread What is the answer ot this: which items of length 2 have a threshold of 3 or greater? ANSWER: cucumber/tomato and tomato/onion wednesday we will cover the descision rules.... ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ march 22, 2006 Whenever you hand in something, you place your name in the filename, even the java names so you have to modify the java class names Stanford lecture tomorrow, thursday at 7:30 pm to 8:30 pm. Last lecture of the semester according to him. He then discusses the grading for the lectures for about 15 minutes. Midterm the monday after spring break. Class notes should be added on upon(more notes from outside sources and practice) to recieve full credit. Minimum notes will get minimum passing grade. Project due when we come back. The two token dfidf project. Notes of soft copy due Monday when we come back. Should include project notes. He says the grades will be distributed as follows: 100,95,90,85,80,75,70,50(if you dont turn in) So on Monday for CS157B: we have midterm, project, soft copy notes Last 10 minutes of class... BITMAPS: table with two columns - is better than datawarehouse - vertical representation - ex) a translation of a bitmap from his bitmap slides(online) of example 1 1 2 3 4 ------------------------------------------------------30 1 1 0 0 40 0 0 1 0 50 0 0 0 1 5 6 0 1 0 1 0 0 foo bar baz 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 This way of a database is better since it takes up less bits than the old way since the o's and 1's are only on bit, while placing the token names over and over can take a lot more. Know how to convert a regular table into a bitmap. Also we covered the binary conversion of the tables to find the intersection of several columns.