On-Line Testing Center Database Laboratories Root Questions Automating Homeworks 1 Tools for Increasing the Efficiency of Teaching 1. Laboratories that give immediate, accurate feedback for teaching SQL, e.g. 2. Automated homeworks that simulate the effect of carefully graded “long-answer” homework. 3. Lectures consisting of PowerPoint slides with voiceover. 2 Productivity in Education The education industry has a terrible productivity-improvement record. Using database courses as a focus, we have developed a system, OTC (On-Line Testing Center) that automates grading and allows teaching effort to be reused. 3 Comparison: Versus Telecom Tuition 3-min LD call Ratio 1959 $ 1,200 $3.00 400 2004 $30,000 $0.15 200,000 In 45 years, high-end college tuition has gotten 5000 times more expensive relative to a long-distance phone call! 4 But Isn’t … ? The telecom industry is arguably the best example of the use of technology to reduce costs. How about the much-maligned US Post Office? 5 Comparison: Versus Post Office Tuition Airmail Stamp Ratio 1959 $ 1,200 $0.08 15,000 2004 $30,000 $0.37 81,000 In 45 years, high-end college tuition has gotten 5.4 times more expensive relative to a stamp! 6 One Thing I’d Like to Do, but Haven’t Global, on-line TA system. Students can ask questions about a given course, via email. They get fast email response. A TA network is guided by a database of previous questions and answers. 7 Lectures We have PowerPoint slides with voiceover for an introductory DB course. Intended use: play for 50-60% of the lecture; use the rest of the time for discussion. Pace is critical --- stop for class thought after each slide. 8 Solution Beer groups with at least 3 non-NULL bars and also beer groups where the manufacturer is Pete’s. SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = ’Pete’’s’); Beers manufactured by Pete’s. 9 Laboratory Assignments Conventional SQL homework: “Here is a database; write these queries in SQL.” TA’s look at SQL answers and try to figure out whether the queries do what they’re supposed to do. Rate of regrades tells me this task is too hard to get right. 10 OTC Laboratory Students get a description of a schema and an English description of several queries to write. Queries are sent to a database system (Oracle or mySQL) for testing on a preset, sample database. 11 OTC Laboratory --- (2) Three possible outcomes: 1. Syntactically incorrect --- they get the feedback of the DBMS. 2. Semantically incorrect --- they get to see what their query did on another sample database and what it should have done. 3. Semantically correct --- they get credit for the problem. 12 Creating an SQL Lab 1. Stem to describe the schema and queries. 2. CREATE TABLE statements to define the schema. 3. INSERT statements for the test DB. 4. INSERT statements for the sample DB. 5. Reference queries for the correct answers. 13 Other Labs Recently added: similar lab-creation faciltities for: 1. Relational algebra. 2. JDBC. 3. XQUERY. 14 Policy Issues The lab is set up so students may submit a query as many times as they like. Once correct, a query can be stored and the next one worked on. 15 Problem: Automate Construction of Sample DB’s. Queries involve particular constants. Changing the constants in your explanation doesn’t explain anything. Example: “find all the bars in Boston.” The sample DB better not change ’Boston’ in tuples or you’ll be explaining: “if the DB contains (’Joe’’s Bar’, ’Miami’) you need to produce ’Joe’’s Bar’ in your answer.” 16 A Harder Example Consider query: “find all the beers Joe’s Bar sells for less than $5.” You can’t change prices in tuples like (’Joe’’s Bar’, ’Bud’, 4.00) randomly, or you’ll give advice like “if the DB contains (’Joe’’s Bar’, ’Coors’, 6.50), you need to produce ’Coors’.” 17 Example --- Continued You need a “less than $5 – preserving” transformation. Example: p -> 2*p – 5. 18 Automating Homework The heart of OTC is a system for automating homeworks and exams. Goal 1: Encourage students to work “long-answer” problems for themselves. Goal 2: Inhibit cheating. Goal 3: Eliminate the drudgery of grading. 19 Modeling “Long-Answer” Questions with Multiple-Choice Here is a typical “long-answer” question we might ask in a DB course: Relation R consists of the following tuples, and relation S has the following tuples. Compute the join of R and S. 20 Root Questions A root question is a multiple-choice question with several right and many wrong answers. Example: Relation R consists of the following tuples, and relation S has the following tuples. Which of these tuples is in the join of R and S ? 21 Writing a Root Question The question-designer provides several correct answers. In our example, each tuple of the join could be one correct answer. Many wrong answers are also provided. Here, any tuple of the correct length that is not in the join could be used. 22 Writing Root Questions --- (2) For each wrong answer, write a choice explanation that gives student a hint or explanation of why it is wrong. For the question as a whole, write a question explanation. 23 Assigning Root Questions The instructor develops an assignment consisting of several root questions. 4-6 seems to be the right number --- we’ll see why. Students take the assignment as many times as they like and are encouraged to get a perfect score. Only the final score counts. 24 Assigning Root Questions --- (2) Each time the student opens the assignment, they are given the same questions, but with a different choice of one correct and three incorrect answers, in random order. To prevent rapidfire guessing, the student may open an assignment only once per x minutes (instructor choice). 25 Student Responses Each root question suggests a conventional, “long-answer” question, that the student should work. Example: for the join question, they may as well compute the entire join. With the join tuples listed on scratch paper, they can quickly solve any instance of the root question. 26 Automatic Student Help When a student submits work, they immediately get the choice explanations for those questions they get wrong. After the due date, students can see their assignments, with not only the choice explanations but the question explanations as well. 27 How Many Questions? We recommend 4-6 questions per assignment. Fewer than 4 encourages students to guess; too many questions runs the risk a student will miss one for carelessness. When first given at Stanford with no limit, some students tried hundreds of times. 28 Comparison There is a simpler scheme used in courses like physics, where questions are parametrized, and the correct answer computed by a formula. A weight of $w kilograms is dropped from height $h. How long does it take the weight to reach the ground? 29 Comparison --- (2) Question is generated by choosing random values of the parameters, and the answer checked against the result of the formula. Root questions simulate this question type by selecting many parameter values and asking for a correct pairing of parameters and result. 30 Comparison --- (3) Example: A weight of w kilograms is dropped from height h. For which of the following triples (w, h, t ) is t the time it takes the weight to reach the ground? 31 Comparison --- (4) In the database domain, many kinds of questions cannot have their answer computed by arithmetic formula: “Which of these functional dependencies follows from the given FD’s?” “Which of these schedules is serializable?” “For which relation sizes is query plan A better than plan B?” 32 Comparison --- (5) If you are willing to write a program to (say) test serializability, you can write a program that generates a root question with lots of serializable and lots of unserializable schedules. The output of this program can be input automatically to OTC. 33 OTC Status About 400 root questions, mostly on databases, developed. Many have explanations included. Let’s face it: writing a root question correctly is hard. But once done and debugged, it can be used in many courses. 34 OTC History --- Spring, Fall, 2002 One assignment in Stanford CS347 (Transaction-Processing and Distributed Databases) supported, Spring 2002. CS145 (Intro. DB course at Stanford) supported in Fall, 2002. 2 Lab assignments, 11 root-question assignments, midterm (not root questions). 35 OTC --- Winter, 2003 Supported CS245 (DB Implementation, Hector Garcia) at Stanford. Supported a CS145/245-like course at North Carolina State (Rada Chirkova). 36 OTC Status --- Spring 2003 Supported CS145, CS347, and CS345 (DB Theory) at Stanford. Continued support at NC State. Supported CS145-like courses at UC Santa Cruz (Arthur Keller) and Univ. of Leipzig (Erhard Rahm). Supported a Discrete Math course at NTU Athens (Foto Afrati). 37 OTC Status --- Fall 2003 New Sites: Penn State, U. Chicago, Yale, U. Alabama, U. Karlsruhe, York College/CUNY, U. Business & Econ. (Athens). 38 OTC Development Team The core software was developed by Murty Valiveti and his team at Gautami Software. Alan Beck and Ramana Yerneni adapted the OTC core for database instruction and implemented a number of important features. 39 Content Creators Alan Beck: SQL, JDBC, and XQuery labs. Austin Shoemaker: relational algebra lab. Root-question developers: Foto Afrati, Rada Chirkova, Mayur Datar, Prasanna Ganesan, Wang Lam, Anand Rajaraman, Jeff Ullman, Jennifer Widom, Ramana Yerneni. 40 Find Out More A tutorial for instructors is at www-db.stanford.edu/~ullman/pub/otc.pdf Demo site: chub.stanford.edu:8181/CS145-demo/index.html 41