AGENDA • ODI Performance • ODI Scheduling • ODI Deployment/Release BI-Quotient www.bi-q.ie ULI BETHKE • • • • • • • BI-Quotient www.bi-q.ie Dublin based Blog www.bi-q.ie ODI 2007 Reviewer two ODI books ODI articles OTN Deputy chair OUG BI SIG. Next event 11th June ODI advanced trainer ODI PERFORMANCE BI-Quotient www.bi-q.ie ODI is a metadata driven (SQL) code generator using code templates (knowledge modules). It uses a Java agent to communicate and send data between source and target systems and the repository over the network. SQL BI-Quotient www.bi-q.ie - > 80%: ODI performance issues = SQL issues => SQL main ODI skill - Perfect your SQL. Advanced SQL. Analytic Functions - Know your database(s) inside out. In particular the target - Understand, write, and modify Knowledge Modules AGENT - BI-Quotient www.bi-q.ie Light weight Java based application Tied to host OS Generates code based on ODI metadata. Communicates source, target, repository. JDBC data transport XML Jetty Interpreters: Jython, JBS, JavaScript, Groovy HSQLDB in memory database Scheduler Sizing AGENT BI-Quotient www.bi-q.ie Target - Least amount of roundtrips. Network (JDBC, XML) - One target database server only (DW) Another Server - ODBC drivers - JEE agent on Weblogic - No support for target OS - Resources on target - DBA INTERFACES BI-Quotient www.bi-q.ie - No!! KM using row by row processing - Use ODI functions rather than DB functions - Don’t overuse CKM (especially for large data volumes) - temp indexes (I$) - Gather statistics (C$, I$, TGT when applicable) - Rule of thumb: Use loader KMs or db link KMs rather than JDBC KMs SOURCE/TARGET BI-Quotient www.bi-q.ie - Schemas on same database server. Physical schema and not data server. - Have sources physically close to target - Minimize impact on source - Chunking CRITICAL PATH NETWORK PATHS: B> E>H B>D> F B>D>G A>C>G BI-Quotient www.bi-q.ie PATH DURATIONS: 6 + 2 + 11=19 6 + 4 + 14=24 6 + 4 + 10=20 9 + 8 + 10=27 CRITICAL PATH MICRO TUNING • • • • • • BI-Quotient www.bi-q.ie JDBC drivers JVM Type 4 or 5 JDBC drivers (Data Direct) Array fetch size. DB packet size. Network packet size. PERFORMANCE MONITORING • • • • • ODI Log Data Mart Facts Dimensions Metrics Frontend BI-Quotient www.bi-q.ie DBMS_SQLTUNE_UTIL0 BI-Quotient www.bi-q.ie • dbms_sqltune_util0.sqltext_to_sqlid • Link to Data Dictionary Tables BI-Quotient MACIEJ KOCON • • • • • Dublin based ODI 2005 (Sunopsis) Reviewer two ODI books Blog www.bi-q.ie maciek@bi-q.ie www.bi-q.ie ORCHESTRATING DWH PROCESSES BI-Quotient www.bi-q.ie • Orchestration of Data Process Flow – Standard DWH Process flow orchestration – Packages in Oracle Data Integrator 10g – Load Plans in Oracle Data Integrator 11g • Process Flow use cases - efficiency analysis • Alternative scheduling – benefits TYPICAL DATA FLOW in DWH step BI-Quotient www.bi-q.ie 1 STAGE DATA EXTRACT loads data from sources E-LT TYPICAL DATA FLOW in DWH step 1 step BI-Quotient www.bi-q.ie 2 STAGE DIMs DATA EXTRACT loads data from sources LABEL provides structured labeling information E-LT BI-Quotient TYPICAL DATA FLOW in DWH step 1 step 2 www.bi-q.ie step 3 STAGE DIMs FACTS DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts E-LT BI-Quotient TYPICAL DATA FLOW in DWH step 1 step 2 www.bi-q.ie step 3 STAGE DIMs FACTS DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts E-LT data transport & transform units BI-Quotient TYPICAL DATA FLOW in DWH step 1 step www.bi-q.ie 2 step 3 STAGE DIMs FACTS DATA EXTRACT loads data from sources LABEL provides structured labeling information FACTS consists of measurements, metrics or facts ODI 10g Packages ODI 11 Load Plans E-LT data transport & transform units orchestration ORCHESTRATION – ODI PACKAGES using object directly PKG_ABC INT_A PRC_B INT_C PKG_DE INT_D INT_E BI-Quotient www.bi-q.ie ORCHESTRATION – ODI PACKAGES using object directly PKG_ABC SYNCHRONOUS INT_A PRC_B PRC_B INT_C INT_C PKG_DE INT_E www.bi-q.ie using scenarios – compiled code INT_A INT_D BI-Quotient PKG_DE PKG_ABCDE ORCHESTRATION – ODI PACKAGES using object directly PKG_ABC BI-Quotient www.bi-q.ie using scenarios – compiled code SYNCHRONOUS INT_A INT_A PRC_B PRC_B INT_C INT_C PKG_ABCDE PKG_DE INT_D INT_E PKG_DE ASYNCHRONOUS INT_A PRC_B INT_C PKG_DE PKG_ABCDE BI-Quotient ODI 10g vs. ODI 11 www.bi-q.ie FACTS DIMs STAGE PKG_DM PKG_ABC PKG_FG PKG_DE INT_A INT_C INT_F ODI 10g PRC_B INT_C PRC_D PRC_G Packages A D F B E G C BI-Quotient ODI 10g vs. ODI 11 STAGE PKG_DM PKG_ABC www.bi-q.ie DIMs PKG_DE FACTS PKG_FG INT_A INT_C INT_F ODI 10g PRC_B INT_C PRC_D PRC_G Packages ODI 11 Load plans BI-Quotient ODI 10g vs. ODI 11 www.bi-q.ie FACTS DIMs STAGE PKG_DM PKG_ABC PKG_FG PKG_DE INT_A INT_C INT_F ODI 10g PRC_B INT_C PRC_D PRC_G Packages ODI 11 Load plans A D F B E G C SAME EFFECT! BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS www.bi-q.ie Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 B 10 C 10 D E 30 10 30 10 F 10 G 10 A 10 10 10 sequential parallel A 30 10 30 + 30 + 10 = 70 D F E G B C BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS www.bi-q.ie Standard Flow Orchestration: Stage-(stop)DIMs-(stop)Facts 30 B 10 C 10 D E 30 10 30 10 F 10 G 10 A 10 10 10 sequential parallel A 30 10 30 + 30 + 10 = 70 DOWNSIDES: • POSSIBLE INEFFICIENCIES (IDLE RESOURCES) D F E G B C PROCESS FLOW EFFICIENCY ANALYSIS OPTIMIZATION ATTEMPT A 30 B 10 C 10 D E 10 30 F 10 G 10 BI-Quotient www.bi-q.ie BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS www.bi-q.ie OPTIMIZATION ATTEMPT B 10 C sequential D E 10 30 F 10 G 10 A 10 10 30 10 10 parallel A 30 30 10 10 30 + 10 + 10 = 50 10 + 30 70 50 = 1.4 times quicker! UPSIDE: • EFFICIENCY IMPROVED D F B E C G ADVANCED DATA FLOW EXAMPLE BI-Quotient www.bi-q.ie ENTERPRISE DWH DATA FLOW EXAMPLE BI-Quotient www.bi-q.ie ENTERPRISE DWH DATA FLOW EXAMPLE BI-Quotient www.bi-q.ie BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS www.bi-q.ie OPTIMIZATION ATTEMPT B 10 C sequential D E 10 30 F 10 G 10 A 10 10 30 10 10 parallel A 30 30 10 10 30 + 10 + 10 = 50 10 + 30 70 50 = 1.4 times quicker! UPSIDE: • EFFICIENCY IMPROVED DOWNSIDES: • TIMINGS KNOWLEDGE REQUIRED • OVERALL DEPENDECY KNOWLEDGE REQURED D F B E C G BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS www.bi-q.ie OPTIMIZATION ATTEMPT 30 B 10 C D E 10 30 F 10 G 10 10 A 70 10 10 10 70 10 30 30 10 sequential 30 + 30 + 10 = 70 DOWNSIDE: • INEFFICIENCY EXISTS BUT CAN’T BE RESOLVED • CONSUMER WAITING & IMPACT parallel A D F E G B C TRADITIONAL SCHEDULING - LIMITATIONS • • • • • Possible inefficiencies (idle resources) Timings knowledge required Overall dependecy knowledge requred Inefficiency exists but can’t be resolved Consumer waiting & impact BI-Quotient www.bi-q.ie TRADITIONAL SCHEDULING - LIMITATIONS • • • • • BI-Quotient www.bi-q.ie Possible inefficiencies (idle resources) Timings knowledge required Overall dependecy knowledge required Inefficiency exists but can’t be resolved Consumer waiting & impact SCHEDULER BI-Quotient DEPENDENCY DRIVEN SCHEDULING A www.bi-q.ie D B C E A A A B B B C C C D D D E E E A A A B B B C C C D D D E E E BI-Quotient DEPENDENCY DRIVEN SCHEDULING A www.bi-q.ie D PACKGAGES & LOAD PLANS B C E A A A B B B C C C D D D E E E A A A B B B C C C D D D E E E BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS 30 B 10 C D E 10 30 F 10 G 10 10 A 70 10 10 30 10 A 30 B 10 C 10 10 70 10 30 D E 10 10 30 F 10 G 10 10 10 30 30 30 + 30 + 10 = 70 10 10 sequential parallel A www.bi-q.ie D F E G B C BI-Quotient PROCESS FLOW EFFICIENCY ANALYSIS 30 B 10 C D E 10 30 F 10 G 10 10 30 10 A 30 B 10 10 A 70 10 10 C 70 10 30 D E 10 30 sequential parallel A 30 + 30 + 10 = 70 10 F 10 G 10 30 70 10 10 10 10 30 30 www.bi-q.ie 10 70 30 = 2.3 times faster! D F E G B C BI-Quotient www.bi-q.ie DEPENDENCY DRIVEN SCHEDULING • Simplifies orchestrating the flow – only immediate upstream definition required – execution timings not relevant – self-adapts in the most effective way • Improves overall E-LT performance – Less idle resources – better utilization – Independency – unveils its full potential in complex Enterprise class DWHs (Inmon) BI-Quotient www.bi-q.ie DEPENDENCY DRIVEN SCHEDULING • Notifications – errors (+auto-restartability) – finish summary – logging • Multiple/overlapping E-LT streams – load with different frequencies • Parameterization – improved system stress control – process prioritization F I R S T RUN 10 processes F I R S T RUN T O D AY 10 584 processes processes 1389 DEPENDENCIES F I R S T RUN T O D AY 10 584 processes processes 1389 DEPENDENCIES 132 231 SCENARIOS RUN F I R S T RUN T O D AY 10 584 processes processes 1389 DEPENDENCIES 132 231 SCENARIOS RUN TIME 12h43m LOAD PLANS F I R S T RUN T O D AY 10 584 processes processes 1389 DEPENDENCIES 132 231 SCENARIOS RUN TIME 12h43m 4h21m LOAD PLANS SCHEDULER 2.9 TIMES FASTER ENTERPRISE DWH DATA FLOW BI-Quotient www.bi-q.ie RELEASE 1.0 BI-Quotient www.bi-q.ie RELEASE 2.0 TST BI-Quotient www.bi-q.ie TESTING RELEASE 2.0 BI-Quotient www.bi-q.ie DEPLOY RELEASE 2.0 PRD BI-Quotient www.bi-q.ie THE HOT FIX SITUATION RELEASE FREQUENTLY BI-Quotient www.bi-q.ie CI ENVIRONMENT BI-Quotient www.bi-q.ie CI ENVIRONMENT BI-Quotient www.bi-q.ie THE BUILD MASTER BI-Quotient www.bi-q.ie AUTOMATE STUFF BI-Quotient www.bi-q.ie ODI VS. SOURCE CONTROL BI-Quotient www.bi-q.ie ODI STRUCTURE BI-Quotient www.bi-q.ie BEYOND INTRA BUILD DEPENDENCIES BI-Quotient www.bi-q.ie