High Performance PL/SQL Guy Harrison Chief Architect, Database Solutions Copyright © 200\8 Quest Software PL/SQL top tips 1. Optimize network traffic 2. Array processing 3. Set PLSQL_OPTIMIZE_LEVEL 4. Loop processing 5. Recursion 6. NoCopy 7. Associative arrays 8. Bind variables in NDS 9. Number crunching 10. Using the profiler 11. G Tip 0: It’s usually the SQL • Most PL/SQL routines spend most of their time executing SELECT statements and DML • Tune these first: – Identify proportion of time spent in SQL (profiler, V$SQL) – Use SQL Trace+ tkprof or the profiler to identify top SQL • SQL tuning is a big topic but: – Look at statistics collection policies • In development AND in production – Consider adequacy of indexing – Learn hints – Exploit 10g/11g tuning facilities (if licensed) – Don’t issue SQL when you don’t need to PLSQL_OPTIMIZE_LEVEL • Introduced in 10g • Controls transparent optimization of PL/SQL code similar to reorganizing code – Level 0: No optimization – Level 1: Minor optimizations, not much reorganization – Level 2: (the default) Significant reorganization including loop optimizations and automatic bulk collect – Level 3: (11g only) Further optimizations, notably automatic in-lining of subroutines Motivations for stored procedures • Historically: – – – – – – – Security Client-Server division of labour Separation of business logic Manageability Portability ? Network overhead Divide and conquer complex SQL • Today – Middle tier provides most of these – Network traffic is perhaps the strongest remaining motivation Optimizing network traffic • PL/SQL routines most massively outperform other languages when network round trips are significant. Network traffic • Routines that process large numbers of rows and return simple aggregates are also candidates for a stored procedure approach Stored procedure alternative Network traffic example Array processing • Considered bad: • Excessive loop iterations • Increases logical reads (rows in the same block fetched separately) Array processing • Considered better: • Selects all data in a single operation • Large result sets might take longer as memory grows • Other concurrent sessions may have limited memory for sorts, etc. • Out of memory errors are possible Array processing • Considered best: • Never more that p_array_size elements in collection • Best throughput, acceptable memory utilization Array processing (plsql_optimize_level=1) No bulk collect Elapsed Time 200 180 160 140 120 100 80 60 40 20 0 Bulk collect without LIMIT 1 10 100 1000 10000 Bulk Collect Size 100000 1000000 Bulk Collect and PLSQL_OPTIMIZE_LEVEL • PLSQL_OPTIMIZE_LEVEL>1 causes transparent BULK COLLECT LIMIT 100 • This means that FOR loops can actually be more efficient that unlimited BULK COLLECT! 300 Elasped time (s) ******************************************************************************** 250 SQL ID : 6z2hybgm1ahkh No Bulk Collect SELECT /*+ cache(t) */ PK, DATA FROM 200BULK_COLLECT_TAB Bulk Collect no limit call count cpu elapsed disk query current rows 150 ------- ------ -------- ---------- ---------- ---------- ---------- ---------Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 100 Fetch 25000 3.26 12.49 73530 98241 0 2499998 ------- ------ -------- ---------- ---------- ---------- ---------- ---------total 25002 3.26 12.49 73530 98241 0 2499998 50 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: 88 (recursive depth: 1) 0 1 10 100 1000 10000 100000 1000000 10000000 Array Size Reduce unnecessary Looping • Unnecessary loop iterations burn CPU Poorly formed loop 34.31 3.96 Well formed loop 0 5 10 15 20 Elapsed time (s) 25 30 35 Remove loop Invariant terms • Any term in a loop that does not vary should be extracted from the loop • PLSQL_OPTIMIZE_LEVEL>1 does this automatically Loop invariant terms relocated Loop invariant performance improvements 11.09 Original loop 5.87 Optimized loop 5.28 plsql_optimize_level=2 0 2 4 6 Elapsed time (s) 8 10 12 Recursive routines • Recursive routines often offer elegant solutions. • However, deep recursion is memory-intensive and usually not scalable Recursion memory overhead 1400 PGA memory (MB) 1200 1000 800 Recursive 600 Non-recursive 400 200 0 0 2000000 4000000 6000000 Recursive Depth 8000000 10000000 NOCOPY • The NOCOPY clause causes a parameter to be passed “by reference” rather than “by value” • Without NOCOPY, a copy of each parameter variable is created within the subroutine • This is particularly expensive when collections are passed as parameters NoCopy performance gains • 4,000 row, 10 column “table”; 4000 lookups: 864.96 NO NOCOPY 0.28 NOCOPY 0 100 200 300 400 500 Elapsed time (s) 600 700 800 900 Associative arrays • Traditionally, sequential scans of PLSQL tables are used for caching database table data: Associative arrays • Associative arrays allow for faster and simpler lookups: Associative array performance • 10,000 random customer lookups with 55,000 customers Sequential scan 29.79 0.04 Associative lookups 0 5 10 15 Elapsed time (s) 20 25 30 Bind variables in Dynamic SQL • Using bind variables allows sharable SQL, reduces parse overhead and minimizes latch contention • Unlike other languages, PL/SQL uses bind variables transparently • However, dynamic SQL makes it easy to “forget” Using bind variables Bind variable performance • 10,000 calls like this: No Binds 7.84 3.42 Bind variables 0 1 2 3 4 5 Elasped Time (s) 6 7 8 Number crunching • Until recently, it’s been hard to determine how much time is spent in PLSQL code, vs time in SQL inside PLSQL: Java for computation? 11g 10g Java PLSQL Native PLSQL 9i 8i 0 5 10 15 20 Elapsed Time (s) Your results will vary 25 30 35 Why Native didn’t work well for me… • I need a routine with no SQL and no built in functions! The profiler • DBMS_PROFILER is the best way to find PL/SQL “hot spots”: Toad profiler support Hierarchical profiler $ plshprof -output hprof demo1.trc Plshprof output DBMS_HPROF tables Toad Hierarchical profiler 11g and other stuff • • • • • • 11g Native compilation 11g In-lining Data types (SIMPLE_INTEGER) IF and CASE ordering SQL tuning (duh!) PLSQL Function cache Function cache example • Suits deterministic but expensive functions • Expensive table lookups on non-volatile tables • 100 executions, random date ranges 1-30 days: 5.21 No function cache 1.51 Function cache 0 1 2 3 Elapsed time (s) 4 5 6 Thank You – Q&A