Aurora: a new model and architecture for data stream management Daniel J. Abadi1, Don Carney2, Ugur Cetintemel2, Mitch Cherniack1, Christian Convey2, Sangdon Lee2, Michael Stonebraker3, Nesime Tatbul2, Stan Zdonik2 1 Department of Computer Science, Brandeis University Department of Computer Science, Brown University 3 Department of EECS and Laboratory of Computer Science, M.I.T. 2 Presenter: Saurin Kadakia ABOUT ME MS CS STUDENT GRADUATING IN DEC 08 INTERESTED IN DATABASES AND WEB TECHNOLOGY WHAT ARE MONITORING APPLICATIONS?? MONITORING APPLICATIONS ARE APPLICATIONS THAT MONITOR CONTINUOUS STREAMS OF DATA. EXAMPLES?? MILITARY APPLICATIONS FINANCIAL ANALYSIS APPLICATIONS TRACKING APPLICATIONS TRADITIONAL DBMS ASSUMPTIONS HUMAN ACTIVE, DBMS PASSIVE MODEL ONLY CURRENT VALUE IMPORTANT TRIGGERS/ASSERTIONS ARE SECONDARY QUERIES MUST HAVE EXACT ANSWERS NO REAL TIME SERVICE REQUIREMENTS REALITY FOR MONITORING APPLICATIONS DBMS ACTIVE, HUMAN PASSIVE MODEL HISTORY OF VALUES REQUIRED TRIGGER ORIENTED APPLICATIONS APPROXIMATE ANSWERS TO QUERIES REAL TIME REQUIREMENTS SYSTEM MODEL User application QoS spec Query spec Aurora System External data source Historical Storage Operator boxes data flow Continuous & ad hoc queries Application administrator QUERY MODEL Traditional Structured Query Language Declarative query on static data Aurora Data flow model for data stream Application manager will construct queries using GUI Stream Query Algebra Queries are processed by SQuAl operators on the data stream Some of the operators are filter, map, union, aggregate, join bsort, resample. AURORA QUERY MODEL QoS spec data input b1 b2 b3 app continuous query Connection point b4 QoS spec b5 view b6 ad-hoc query b7 b8 b9 app QoS spec AURORA QoS GRAPH TYPES OPTIMIZATION Aggregate Map Join Filter Hold pull data Union Continuous query Filter Hold Ad hoc query Filter BSort Map Static storage Aggregate Join OPTIMIZATION Dynamic continuous query optimization Inserting projections Combining boxes Reordering boxes AURORA RUNTIME ARCHITECTURE inputs Storage Manager outputs Router σ μ Q1 Q2 Scheduler Qm Buffer manager Box Processors Catalog Persistent Store Q1 Q2 Qn Load Shedder QoS Monitor SUMMARY Solution approach itself Rethink about everything for the requirements Query model Data flow style query specification Optimization Dynamic runtime optimization QoS specification based resource management QUESTIONS???