DBI_Talk5_2001 - CPAN

advertisement
Advanced DBI
Perl Conference 5.0 Tutorial
July 2001
Tim Bunce
Advanced DBI tutorial
© Tim Bunce
July 2001
Topical Topics











Speed Speed Speed!
Handling handles and binding values
Error checking and error handling
Wheels within wheels
Transactions
DBI for the web
Tainting
Handling LONG/BLOB data
Portability
Proxy power and flexible multiplex
What’s new and what’s planned
2
Advanced DBI tutorial
© Tim Bunce
July 2001
Trimmed Topics and Tips

Lack of time prevents the inclusion of ...

Details of issues relating to specific databases and drivers
– (other than where used as examples of general issues)
– each driver would warrant a tutorial of it’s own!

Non-trivial worked examples
Handy DBIx::* and other DBI related modules

… and anything I’ve not finished implementing as of July 2001 (DBI 1.19) ...


But I hope you’ll agree that there’s ample information in the following 70+
slides…

Tips for those attending the conference tutorial:

Doodle notes from my whitterings about the ‘whys and wherefores’ on your
printed copy of the slides as we go along...
3
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s it all about?

DBI defines and implements an interface to databases

Plug-in driver modules do the database-specific work

DBI provides default methods, functions, tools etc for drivers

Not limited to the lowest common denominator

Designed and built for speed

Powerful automatic error checking built-in

Valuable detailed call tracing/debugging built-in
4
Advanced DBI tutorial
© Tim Bunce
July 2001
A picture is worth?
Perl Application
DBI Module
DBD::Oracle
Oracle Server
DBD::Informix
Informix Server
DBD::Other
Other Server
5
Speed
Speed
Speed!
What helps,what doesn't
Advanced DBI tutorial
© Tim Bunce
July 2001
Give me speed!

DBI was designed for speed from day one

DBI method dispatcher written in hand-crafted XS/C

Dispatch to XS driver method calls optimized

Cached attributes returned directly by DBI dispatcher

DBI overhead is generally insignificant
– So we'll talk about other speed issues instead ...
7
Advanced DBI tutorial
© Tim Bunce
July 2001
Partition for speed

Application partitioning
– do what where? - stop and think - work smarter not harder


Pick the right database for the job, if you have the choice.
Work close to the data
– Moving data to/from the client is always expensive
– Consider latency as well as bandwidth
– Use stored procedures where appropriate
– Do more in SQL where appropriate - get a good book




Multiple simple queries with 'joins' in Perl may be faster.
Use proprietary bulk-load, not Perl, where appropriate.
Consider local caching, in memory or DBM file etc, e.g. Memoize.pm
Mix 'n Match techniques as needed - experiment and benchmark.
.
8
Advanced DBI tutorial
© Tim Bunce
July 2001
Prepare for speed

prepare() - what happens in the server...
– Receive and parse the SQL statement into internal form
– Get details for all the selected tables
– Check access rights for each
– Get details for all the selected fields
– Check data types in expressions
– Get details for all the indices on all the tables
– Develop an optimised query 'access plan' for best execution
– Return a handle for all this cached information

This can be an expensive process
– especially the 'access plan’ for a complex multi-table query

Some databases, like MySQL, don't cache the information but have simpler, and
thus faster, plan creation
.
9
Advanced DBI tutorial
© Tim Bunce
July 2001
How would you do it?
SELECT * FROM t1, t2

WHERE t1.key=1 AND t2.key=2 AND t1.value=t2.value
One possible approach:
Select from one table using its key field (assume both tables have an index on key)
Then, loop for each row returned, and...
select from the other table using its key field and the current row’s value field

But which table to select first?
To keep it simple, assume that both tables have the same value in all rows

If we know that t1.key=1 matches 1000 rows and t2.key=2 matches 1
then we know that we should select from t2 first
because that way we only have to select from each table once

If we selected from t1 first
then we’d have to select from t2 1000 times!

.
An alternative approach would be to select from both and merge
10
Advanced DBI tutorial
© Tim Bunce
July 2001
The best laid plans

Query optimisation is hard
– Intelligent high quality cost based query optimisation is really hard!

Know your optimiser
– Oracle, Informix, Sybase, DB2, SQL Server etc. all slightly different.

Check what it's doing
– Use tools to see the plans used for your queries - very helpful

Help it along

Most 'big name' databases have a mechanism to analyse and store the key
distributions of indices to help the optimiser make good plans.
– Most important for tables with ‘skewed’ (uneven) key distributions
– Beware: keep it fresh, old key distributions might be worse than none

Some also allow you to embed 'hints' into the SQL as comments
– Beware: take it easy, over hinting hinders dynamic optimisation
.
11
Advanced DBI tutorial
© Tim Bunce
July 2001
MySQL’s EXPLAIN PLAN

To generate a plan:
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
tt.ProjectReference, tt.EstimatedShipDate,
tt.ActualShipDate, tt.ClientID,
tt.ServiceCodes, tt.RepetitiveID,
tt.CurrentProcess, tt.CurrentDPPerson,
tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
et_1.COUNTRY, do.CUSTNAME
FROM tt, et, et AS et_1, do
WHERE tt.SubmitTime IS NULL
AND tt.ActualPC = et.EMPLOYID
AND tt.AssignedPC = et_1.EMPLOYID
AND tt.ClientID = do.CUSTNMBR;

The plan is described as results like this:
TABLE
et
tt
et_1
do
TYPE
ALL
ref
eq_ref
eq_ref
POSSIBLE_KEYS
PRIMARY
AssignedPC,ClientID,ActualPC
PRIMARY
PRIMARY
KEY
NULL
ActualPC
PRIMARY
PRIMARY
KEY_LEN
NULL
15
15
15
REF
NULL
et.EMPLOYID
tt.AssignedPC
tt.ClientID
ROWS
74
52
1
1
EXTRA
where used
12
Advanced DBI tutorial
© Tim Bunce
July 2001
Oracle’s EXPLAIN PLAN

To generate a plan:
EXPLAIN PLAN SET STATEMENT_ID = 'Emp_Sal’ FOR
SELECT ename, job, sal, dname
FROM emp, dept
WHERE emp.deptno = dept.deptno
AND NOT EXISTS
(SELECT * FROM salgrade
WHERE emp.sal BETWEEN losal AND hisal);

That writes plan details into a table which can be queried to yield results like this:
ID PAR Query Plan
--- --- -------------------------------------------------0
Select Statement
Cost = 69602
1
0
Nested Loops
2
1
Nested Loops
3
2
Merge Join
4
3
Sort Join
5
4
Table Access Full T3
6
3
Sort Join
7
6
Table Access Full T4
8
2
Index Unique Scan T2
9
1
Table Access Full T1
13
Advanced DBI tutorial
© Tim Bunce
July 2001
Changing plans (hint hint)

Most database systems provide some way to influence the execution plan typically via ‘hints’

Oracle supports a very large and complex range of hints

Hints must be contained within special comments /*+ … */
SELECT /*+ INDEX(table1 index1) */ foo, bar
FROM table1 WHERE key1=1 AND key2=2 AND key3=3;

MySQL has a very limited set of hints

Hints can optionally be placed inside comments /*! … */
SELECT foo, bar FROM table1 /*! USE INDEX (key1,key2) */
WHERE key1=1 AND key2=2 AND key3=3;
.
15
Advanced DBI tutorial
© Tim Bunce
July 2001
Respect your server's SQL cache

Optimised Access Plan etc. is cached within the server
– keyed by the exact original SQL string used
do("insert … $id");
do("insert … ?", undef, $id);

Compare
with

Without placeholders, SQL string varies each time
– so cached one is not reused
– so time is wasted creating a new access plan
– the new statement and access plan are added to cache
– so the cache fills and other statements get pushed out
– on a busy system this can lead to ‘thrashing’
.
16
Advanced DBI tutorial
© Tim Bunce
July 2001
Hot handles

Avoid using $dbh->do(…) in a speed-critical loop
– It’s usually creating and destroying a statement handle each time


Use $sth = $dbh->prepare(…)and $sth->execute() instead
Using prepare() gets a handle on the statement in the SQL cache
– Avoids a round-trip to server for SQL cache check on each use

For example… convert looped
$dbh->do("insert … ?", undef, $id)
into $sth = $dbh->prepare("insert … ?”)
plus a looped $sth->execute($id)

This often gives a significant performance boost
– even where placeholders are emulated, such as MySQL
– because it avoids statement handle creation overhead
.
17
Advanced DBI tutorial
© Tim Bunce
July 2001
Sling less for speed

while(@row = $sth->fetchrow_array) { … }



while($row = $sth->fetchrow_arrayref) { … }



one field: 3,100 fetches per cpu second
ten fields: 1,000 fetches per cpu second
one field: 5,300 fetches per cpu second
ten fields: 4,000 fetches per cpu second
Notes:

Timings made on an old SPARC 10 using DBD::Oracle
Timings assume instant record fetch within driver
Fields all just one char. @row would be even slower for more/bigger fields

Use bind_columns() for direct access to fetched fields without copying


18
Advanced DBI tutorial
© Tim Bunce
July 2001
Bind those columns!

Compare
while($row = $sth->fetchrow_arrayref) {
print “$row->[0]: $row->[1]\n”;
}

with
$sth->bind_columns(\$key, \$value);
while($sth->fetchrow_arrayref) {
print “$key: $value\n”;
}


No row assignment code!
No field access code!
... just magic
19
Advanced DBI tutorial
© Tim Bunce
July 2001
Speedy Summary

Think about the big picture first
– Partitioning, choice of tools etc

Study and tune the access plans for your statements
– Teach your database about any uneven key distributions

Use placeholders - where supported
– Especially for any statements that vary and will be executed often

Replace do() in a loop with prepare() and execute()

Usually… sometimes queries using placeholders are slower!
– Because access plan has to be more general (try using hints in this situation)

Sling less data for faster fetching
– Sling none for fastest!

Other important things to consider…
– hardware, operating system, and database configuration tuning
-
20
Handling your Handles
Get a grip
Advanced DBI tutorial
© Tim Bunce
July 2001
Let the DBI cache your handles

Sometimes it's not easy to hold all your handles
– e.g., library code to lookup values from the database

The prepare_cached() method gives you a client side statement handle
cache:
sub lookup_foo {
my ($dbh, $id) = @_;
$sth = $dbh->prepare_cached("select foo from table where id=?");
return $dbh->selectrow_array($sth, $id);
}

Can avoid the need for global statement handle variables
– which can cause problems in some situations, see later
22
Advanced DBI tutorial
© Tim Bunce
July 2001
Another prepare_cached() example

Can also be used for dynamically constructed statements:
while ( ($field, $value) = each %search_fields ) {
push @sql,
"$field = ?";
push @values, $value;
}
$where = "";
$where = "where ".join(" and ", @sql) if @sql;
$sth = $dbh->prepare_cached("select * from table $where");
$sth->execute(@values);

but beware caching too many variations because, for many databases, each
statement handle consumes some resources on the server (e.g. a cursor)
23
Advanced DBI tutorial
© Tim Bunce
July 2001
Keep a handle on your databases

Connecting to a database can be slow

Try to connect once and stay connected where practical
– We'll discuss web server issues later

The connect_cached() method …





Acts like prepare_cached() but for database handles
Like prepare_cached(), it’s handy for library code
Potentially useful with DBD::Proxy & DBI::ProxyServer
It also checks the connection and automatically reconnects if it's broken
Works well combined with prepare_cached(), see following example
.
24
Advanced DBI tutorial
© Tim Bunce
July 2001
A connect_cached() example

Compare and contrast...
my $dbh = DBI->connect(…);
sub lookup_foo_1 {
my ($id) = @_;
$sth = $dbh->prepare_cached("select foo from table where id=?");
return $dbh->selectrow_array($sth, $id);
}

with...
sub lookup_foo_2 {
my ($id) = @_;
my $dbh = DBI->connect_cached(…);
$sth = $dbh->prepare_cached("select foo from table where id=?");
return $dbh->selectrow_array($sth, $id);
}
Clue: what happens if the database is restarted?
.
25
Advanced DBI tutorial
© Tim Bunce
July 2001
Some connect_cached() gotchas


Because connect_cached() may return a new connection it’s important to specify
all significant attributes at connect time

e.g., AutoCommit, RaiseError, PrintError

So pass the same set of attributes into all connect calls
It’s new and subject to change with experience


The DBI may, in future, optionally keep track of which attributes have been changed, in
which case the connect_cached() method could optionally reset the attributes of the
new connection to be the same as current on the old.
Similar, but not quite the same as Apache::DBI

Doesn’t disable the disconnect() method.
26
Binding (Value Bondage)
Placing values in holders
Advanced DBI tutorial
© Tim Bunce
July 2001
First, the simple stuff...

After calling prepare() on a statement with placeholders:
$sth = $dbh->prepare(“select * from table where k1=? and k2=?”);

Values need to be assigned (‘bound’) to each placeholder before the
database can execute the statement

Either at execute, for simple cases:
$sth->execute($p1, $p2);

or before execute:
$sth->bind_param(1, $p1);
$sth->bind_param(2, $p2);
$sth->execute;
28
Advanced DBI tutorial
© Tim Bunce
July 2001
Then, some more detail...

If $sth->execute(…) specifies any values, it must specify them all

Bound values are sticky across multiple executions:
$sth->bind_param(1, $k1);
foreach my $k2 (@k2) {
$sth->bind_param(2, $k2);
$sth->execute;
}
29
Advanced DBI tutorial
© Tim Bunce
July 2001
Your TYPE or mine?

Sometimes the data type needs to be specified
use DBI qw(:sql_types);
– to import the type constants
$sth->bind_param(1, $value, { TYPE => SQL_INTEGER });
– to specify the INTEGER type
– which can be abbreviated to:
$sth->bind_param(1, $value, SQL_INTEGER);

To just distinguish numeric versus string types, try
$sth->bind_param(1, $value+0);
# bind as numeric value
$sth->bind_param(1, ”$value”);
# bind as string value
– Works because perl values generally know if they are strings or numbers. So...
– Generally the +0 or ”” isn’t needed because $value has the right ‘perl type’ already
30
Advanced DBI tutorial
© Tim Bunce
July 2001
Some TYPE gotchas

Bind TYPE attribute is just a hint
– and like all hints in the DBI, they can be ignored

Most drivers only care about the number vs string distinction
– and ignore other type of TYPE value

For some drivers that do pay attention to the TYPE…
– using the wrong type can mean an index on the value field isn’t used!
-
31
Error Checking & Error Handling
To err is human,
to detect, divine.
Advanced DBI tutorial
© Tim Bunce
July 2001
The importance of error checking

Errors happen!

Failure happens when you don't expect errors!
– database crash / network disconnection
– lack of disk space for insert or select (sort space for order by)
– server math error on select (divide by zero after 10,000 rows)
– and maybe, just maybe, errors in your own code [Gasp!]

Beat failure by expecting errors!

Detect errors early to limit effects
– Defensive Programming, e.g., check assumptions
– Through Programming, e.g., check for errors after fetch loops
.
33
Advanced DBI tutorial
© Tim Bunce
July 2001
Error checking - ways and means

Error checking the hard way...
$h->method or die "DBI method failed: $DBI::errstr";
$h->method or die "DBI method failed: $DBI::errstr";
$h->method or die "DBI method failed: $DBI::errstr";

Error checking the smart way...
$h->{RaiseError} = 1;
$h->method;
$h->method;
$h->method;
34
Advanced DBI tutorial
© Tim Bunce
July 2001
Handling errors the smart way

Setting RaiseError make the DBI call die for you

For simple applications immediate death on error is fine
– The error message is usually accurate and detailed enough
– Better than the error messages some developers use!

For more advanced applications greater control is needed, perhaps:
– Correct the problem and retry
– or, Fail that chunk of work and move on to another
– or, Log error and clean up before a graceful exit
– or, whatever else to need to do

Buzzwords: Need to catch the error exception being thrown by RaiseError
.
35
Advanced DBI tutorial
© Tim Bunce
July 2001
Handling errors the smart way

Life after death:
$h->{RaiseError} =
eval {
foo();
$h->method; #
bar($h);
#
};
if ($@) {
... handle the
}

1;
fails so the DBI calls die
may also call DBI methods
error here ...
Bonus prize:
– Other, non-DBI, code within the eval block may also raise an exception that will be
caught and can be handled cleanly
36
Transactions
To do or to undo,
that is the question
Advanced DBI tutorial
© Tim Bunce
July 2001
Transactions - What's it all about?

Far more than just locking

The A.C.I.D. test
– Atomicity - Consistency - Isolation - Durability

True transactions give true safety
– even from power failures and system crashes!
– Incomplete transactions are automatically rolled-back by the database
server when it's restarted.

Also removes burden of undoing incomplete changes

Hard to implement (for the vendor)
– and can have significant performance cost

A very large topic worthy of an entire tutorial
38
Advanced DBI tutorial
© Tim Bunce
July 2001
Transactions - Life Preservers

Classic: system crash between one bank account being debited and
another being credited.

Dramatic: power failure during update statement on 3 million rows after 2
seconds when only part way through.

Real-world: complex series of inter-related updates, deletes and inserts on
many separate tables fails at the last step due to a duplicate unique key on
an insert.

Transaction recovery would handle all these situations automatically.
– Makes a system far more robust and trustworthy over the long term.

Use transactions if your database supports them.
– If it doesn't and you need them, switch to a different database.
.
39
Advanced DBI tutorial
© Tim Bunce
July 2001
Transactions - How the DBI helps

Tools of the trade:




Set AutoCommit off, and RaiseError on
Wrap eval { … } around the code
Use $dbh->commit; and $dbh->rollback;
Disable AutoCommit via $dbh->{AutoCommit} = 0;
– to enable transactions and thus rollback-on-error

Enable RaiseError via $dbh->{RaiseError} = 1;
– to automatically 'throw an exception' after an error

Add surrounding eval { … }
– catches the exception, the error text is stored in $@

Test $@ and $dbh->rollback() if set
– note that a failed statement doesn’t automatically trigger a transaction rollback
40
Advanced DBI tutorial
© Tim Bunce
July 2001
Transactions - Example code
$dbh->{AutoCommit} = 0;
$dbh->{RaiseError} = 1;
eval {
$dbh->method(…);
# assorted DBI calls
foo(...);
# application code
$dbh->commit;
# commit the changes
};
if ($@) {
warn "Transaction aborted because $@";
$dbh->rollback;
...
}
41
Advanced DBI tutorial
© Tim Bunce
July 2001
Transactions - Further comments

The eval { … } catches all exceptions
– not just from DBI calls. Also catches fatal runtime errors from Perl

Put commit() inside the eval
– ensures commit failure is caught cleanly
– remember that commit() itself may fail for many reasons

Don't forget that rollback() may also fail
– due to database crash or network failure etc.
– so you may want to call eval { $dbh->rollback() };

Other points:
– Always explicitly commit or rollback before disconnect()
– Destroying a connected $dbh should always rollback
– END blocks can catch exit-without-disconnect to rollback and disconnect cleanly
42
Intermission?
Wheels within Wheels
The DBI architecture
and how to watch it at work
Advanced DBI tutorial
© Tim Bunce
July 2001
Setting the scene

Inner and outer worlds


Inner and outer handles


Application and Drivers
DBI handles are references to tied hashes
The DBI Method Dispatcher

gateway between the inner and outer worlds, and the heart of the DBI
… Now we'll go all deep and visual for a while...
45
Advanced DBI tutorial
© Tim Bunce
July 2001
Architecture of the DBI classes #1
‘’outer’’
‘’inner’’
Base classes
providing
fallback
behavior.
DBD::_::common
DBD::_::dr
DBD::_::db
DBD::_::st
DBI
DBI::dr
DBI::db
DBI::st
DBI handle classes visible to applications.
These classes are effectively ‘empty’.
DBD::A::dr
DBD::A::db
DBD::B::dr
DBD::A::st
DBD::B::db
DBD::B::st
Parallel handle classes implemented by drivers.
46
Advanced DBI tutorial
© Tim Bunce
July 2001
Architecture of the DBI classes #2
‘’outer’’
‘’inner’’
method3
method4
DBI::db
Application
makes calls
to methods
using DBI
handle
objects
method1
method2
method3
method4
method5
method6
DBD::A::db
method1
method2
DBI
dispatch
method1
method3
method4
DBI::_::db
method1
method2
method3
method4
method5
DBI::_::common
DBD::B::db
method4
DBI::st
method7
DBD::A::st
method7
method6
47
Advanced DBI tutorial
© Tim Bunce
July 2001
Anatomy of a DBI handle
‘’outer’’
Handle
Ref.
‘’inner’’
DBI::db
DBI::db
Tied
Hash
Hash
Tie
Magic
DBI
Magic
Attribute
Cache
struct imp_dbh_t {
struct dbih_dbc_t {
… DBI data ...
struct dbih_dbc_t com;
… implementers …
… own data ...
}
}
48
Advanced DBI tutorial
© Tim Bunce
July 2001
Method call walk-through

Consider a simple prepare call:
$dbh->prepare(…)

$dbh is reference to an object in the DBI::db class (regardless of driver)

The DBI::db::prepare method is an alias for the DBI dispatch method

DBI dispatch calls the driver’s own prepare method something like this:
my $inner_hash_ref = …
# from tie magic
my $implementor_class = … # from DBI magic data
$inner_hash_ref->$implementor_class::prepare(...)

Driver code gets the inner hash so it has fast access to the hash contents
.
49
Advanced DBI tutorial
© Tim Bunce
July 2001
Watching the DBI in action

DBI has detailed call tracing built-in

The trace can be very helpful in understanding application behavior and for
debugging

Shows parameters and results

Can show detailed driver internal information

Trace information can be written to a file

Not used often enough
Not used often enough
Not used often enough!
Not used often enough!
50
Advanced DBI tutorial
© Tim Bunce
July 2001
Enabling tracing

Per handle
$h->trace($level);
$h->trace($level, $filename);



Only effects that handle and any new child handles created from it
Child handles get trace level of parent in effect at time of creation
Global (internal to application)
DBI->trace(...);


Sets effective global default minimum trace level
Global (external to application)

Enabled using DBI_TRACE environment variable
DBI_TRACE=digits
DBI_TRACE=filename
DBI_TRACE=digits=filename
DBI->trace(digits);
DBI->trace(2, filename);
DBI->trace(digits, filename);
51
Advanced DBI tutorial
© Tim Bunce
July 2001
Our program for today...
1: #!/usr/bin/perl -w
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
use DBI;
$dbh = DBI->connect('', '', '', { RaiseError => 1 });
$upd = $dbh->prepare("UPDATE prices SET price=? WHERE prod_id=?");
$ins = $dbh->prepare("INSERT INTO prices (prod_id,price) VALUES(?,?)");
$rows = $upd->execute(42, "Widgets");
$ins->execute("Widgets", 42) if $rows == 0;
$dbh->disconnect;
52
Advanced DBI tutorial
© Tim Bunce
July 2001
Trace level 1

Trace level 1 shows method results and line numbers:
<<<<-
connect= DBI::db=HASH(0xe0abc) at DBI.pm
STORE('PrintError', 1)= 1 at DBI.pm line
STORE('AutoCommit', 1)= 1 at DBI.pm line
STORE('RaiseError', 1)= 1 at DBI.pm line
line 356.
382.
382.
382.
<<<<<-
prepare('UPDATE …')= DBI::st=HASH(0xe1238) at test.pl line 7.
prepare('INSERT …')= DBI::st=HASH(0xe1504) at test.pl line 8.
execute= '0E0' at test.pl line 9.
execute= 1 at test.pl line 10.
disconnect= 1 at test.pl line 11.
<- DESTROY= undef
...

Use $DBI::neat_maxlen to alter truncation of strings in trace output
53
Advanced DBI tutorial
© Tim Bunce
July 2001
Trace level 2 and above

Trace level 2 shows calls with parameters and more:
-> connect for DBD::ODBC::dr (DBI::dr=HASH(0x13dfec)~0xe14a4
'' '' **** HASH(0xe0a10))
<- connect= DBI::db=HASH(0xe0ab0) at DBI.pm line 356.
-> STORE for DBD::ODBC::db (DBI::db=HASH(0xe0abc)~INNER 'PrintError' 1)
<- STORE= 1 at DBI.pm line 382.
-> prepare for DBD::ODBC::db (DBI::db=HASH(0xe0ab0)~0xe0abc
'UPDATE prices SET price=? WHERE prod_id=?')
<- prepare= DBI::st=HASH(0xe1274) at test.pl line 7.

Trace level 3 and above shows more processing details
54
DBI for the Web
Hand waving from 30,000 feet
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - Connect speed

Databases can be slow to connect
– Traditional CGI forces a new connect per request

Move Perl and DBI into the web server
– Apache with mod_perl and Apache::DBI module
– Microsoft IIS with ActiveState's PerlEx

Connections can then persist and be shared between requests
– Apache::DBI automatically used by DBI if loaded
– No CGI script changes required to get persistence

Take care not to change the shared session behaviour
– Leave the $dbh in the same state you found it!

Other alternatives include
– FastCGI, CGI::SpeedyCGI and CGI::MiniSvr
56
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - Too many connections

Busy web sites run many web server processes
– possibly on many machines

Limits on database connections
– Memory consumption of web server processes
– Database server resources or licensing

Partition web servers into General and Database

Direct database work to the Database web servers
– Use Reverse Proxy / Redirect / Rewrite to achieve this
– Allows each subset of servers to be tuned to best fit workload
– And/or be run on appropriate hardware platforms
.
57
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - State-less-ness

No fixed client-server pair
– Each request can be handled by a different process.
– So can't simply stop fetching rows from $sth when one page is complete and continue
fetching from the same $sth when the next page is requested.
– And transactions can't span requests.
– Even if they could you'd have problems with database locks being held etc.

Need access to 'accumulated state' somehow:
– via the client (e.g., hidden form fields - simple but insecure)
– via the server:




in the database (records in a session_state table keyed by a ‘session id’)
in the web server file system (DBM files etc) if shared across servers
Need to purge old state info if stored on server, so timestamp it
See Apache::Session module
– DBI::ProxyServer + connect_cached with session id may suit, one day
.
58
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - Browsing pages of results

Re-execute query each time then count/discard (simple but expensive)
– works well for small results sets or where users rarely view many pages
– fast initial response, degrades gradually for later pages
– count/discard in server is better but still inefficient for large result sets
– count/discard affected by inserts and deletes from other processes

Re-execute query with where clause using min/max keys from last results
– works well where original query can be qualified in that way, not common

Select and cache full result rows somewhere for fast access
– can be expensive for large result sets with big fields

Select and cache only the row keys, fetch full rows as needed
– optimisation of above, use ROWID if supported, "select … where … in (…)"

If data is static and queries predictable
– then custom pre-built indexes may be useful

The caches can be stored...
– on web server, e.g., using DBM file with locking (see also ‘spread’)
– on database server, e.g., using a table keyed by session id
59
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - Concurrent editing

How to prevent updates overwriting each other?

You can use Optimistic Locking via 'qualified update':
update table set ...
where key = $old_key
and field1 = $old_field1
and field2 = $old_field2

for all other fields
Check the update row count


and …
If it's zero then you know the record has been changed or deleted by another
process
Note


Potential problems with floating point data values not matching due to rounding
Some databases support a high-resolution 'update timestamp' field that can be
checked instead
60
Advanced DBI tutorial
© Tim Bunce
July 2001
Web DBI - Tips for the novice

Test one step at a time
– Test perl + DBI + DBD driver outside the web server first
– Test web server + non-DBI CGI next




Remember that CGI scripts run as a different user with a different environment expect to be tripped up by that.
DBI trace() is your friend - use it.
Use the Perl "-w" and "-T" options. Always "use strict;"
Read and inwardly digest the WWW Security FAQ:


Read the CGI related Perl FAQs:


http://www.w3.org/Security/Faq/www-security-faq.html
http://www.perl.com/perl/faq/
And, if appropriate, read the mod_perl information available from:

http://perl.apache.org
61
Advanced DBI tutorial
© Tim Bunce
July 2001
DBI security tainting

By default the DBI ignores Perl tainting
– doesn't taint returned data
– doesn't check that parameters are not tainted

The Taint attribute enables that behaviour
– If Perl itself is in taint mode

Each handle has it's own inherited Taint attribute
– So can be enabled for particular connections and disabled for particular statements,
for example:
$dbh = DBI->connect(…, { Taint => 1 });
$sth = $dbh->prepare("select * from safe_table");
$sth->{Taint} = 0; # no tainting on this handle
.
62
Advanced DBI tutorial
© Tim Bunce
July 2001
Handling LONG/BLOB data

What makes LONG / BLOB / MEMO data special?


Fetching LONGs - treat as normal fields after setting:



$dbh->{LongReadLen} - buffer size to allocate for expected data
$dbh->{LongTruncOk} - should truncating-to-fit be allowed
Inserting LONGs



Not practical to pre-allocate fixed size buffers for worst case
The limitations of string literals
The benefits of placeholders
Chunking / Piecewise processing not supported


So you're limited to available memory
Some drivers support blob_read()and other private method
63
Portability
A Holy Grail
(to be taken with a pinch of salt)
Advanced DBI tutorial
© Tim Bunce
July 2001
Portability in practice

Portability requires care and testing - it can be tricky

Platform Portability - the easier bit
– Availability of database client software (and server if required)
– Availability of DBD driver
– DBD::Proxy can address both these issues - see later

Database Portability - more tricky
– Differences in SQL dialects cause most problems
– Differences in data types can also be a problem
– Driver capabilities (placeholders etc)

DBIx::Compat module (in DBIx::RecordSet) may be useful.

DBD::AnyDBD

A standard DBI test suite is needed (and planned).
-
65
The Power of the Proxy
& Flexing the Multiplex
Thin clients, high availability ...
and other buzz words
Advanced DBI tutorial
© Tim Bunce
July 2001
DBD::Proxy & DBI::ProxyServer

Networking for Non-networked databases

DBD::Proxy driver forwards calls over network to remote DBI::ProxyServer

No changes in application behavior
– Only the DBI->connect statement needs to be changed

Proxy can be made completely transparent
– by setting the DBI_AUTOPROXY environment variable
– so not even the DBI->connect statement needs to be changed!

DBI::ProxyServer works on Win32
– Access to Access and other Win32 ODBC and ADO data sources

Developed by Jochen Wiedmann
67
Advanced DBI tutorial
© Tim Bunce
July 2001
A Proxy Picture
DBI::ProxyServer
Application
IO:Socket
DBI
Network
DBD::Proxy
Storable
RPC::pClient
RPC::pServer
Storable
DBI
DBD::Foo
IO::Socket
68
Advanced DBI tutorial
© Tim Bunce
July 2001
Thin clients and other buzz words

Proxying for remote access: "thin-client"
– No need for database client code on the DBI client

Proxying for network security: "encryption"
– Can use Crypt::IDEA, Crypt::DES etc.

Proxying for "access control" and "firewalls"
– extra user/password checks, choose port number, handy for web servers

Proxying for action control
– e.g., only allow specific select or insert statements per user or host

Proxying for performance: "compression"
– Can compress data transfers using Compress::Zlib
.
69
Advanced DBI tutorial
© Tim Bunce
July 2001
The practical realities

Modes of operation

Multi-threaded Mode - one thread per connection
– Not safe for production use with perl 5.5 threads, untested with 5.6 ithreads
– DBI is thread-safe but not thread-hot

Forking Mode - one process per connection
– Most practical mode for UNIX-like systems
– Doesn’t scale to large numbers of connections
– Not available on Windows prior to Perl 5.6.0
– Fork emulation in Perl 5.6.0 not tested with DBI yet

Single Connection Mode - one one connection supported
– Obviously only of limited use

No round-robin mode available yet
– patches welcome
70
Advanced DBI tutorial
© Tim Bunce
July 2001
DBD::Multiplex

DBD::Multiplex
– Connects to multiple databases at once (via DBI)
– Single $dbh used to access all databases
– Executes each statement on each database by default

Could be configured to:
– insert into all databases but select from one
– fallback to alternate database if primary is unavailable
– select round-robin / or at random to distribute load
– select from all and check results (pick most common)

Can be used with DBD::Proxy, either above or below

May also acquire fancy caching in later versions

Watch this space: ftp://not.tdlc.com/pub/Multiplex.pm
– developed by Thomas Kishel
71
And finally...
What’s new or planned?
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s new?

(since the book)
$sth->fetchrow_hashref()
– now 25% faster


$dbh->selectrow_hashref()
$dbh->selectrow_arrayref()
– new one-call prepare + execute + fetch utility methods

$sth = $dbh->table_info(…)
– DBI spec now allows parameters to qualify which tables you want info on

$dbh->quote($string, $type)
– now much faster if a $type argument is given (caches quoting data from type_info_all)

$dbh->type_info_all()
– now also much faster due to added caching
.
73
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s new?

(since the book)
$sth = $dbh->primary_key_info($cat, $schema, $table);
– Return information about the primary keys of a table

@keys = $dbh->primary_key($cat, $schema, $table);
– Simpler way to return information about the primary keys of a table

$h->trace(…)



Trace level 1 only shows return from first and last fetch() calls
Trace level 2 only shows returns from fetch() calls
Trace level 3 shows entry and return from fetch() calls
– Those changes will make it easier to use lower trace levels without drowning in data
.
74
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s new?

(since the book)
$DBI::lasth
– the last handle used to make a DBI call, handy for exception handlers

$statement = $dbh->{Statement}
– holds a copy of the last statement prepared or executed by a child statement handle
– e.g. $DBI::lasth->{Statement} in exception handler

$h->{ShowErrorStatement} = 1
– append $h->{Statement} to RaiseError/PrintError messages:
– DBD::foo::execute failed: duplicate key [for ``insert …’’]
– Many drivers should enable it by default. Inherited by child handles.

$h->{FetchHashKeyName} = ’NAME_lc’
– Define the default attribute used by fetchrow_hashref() for key names
– Makes porting scripts much easier (e.g., MySQL <-> Oracle)
.
75
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s planned? (the big stuff)


$h->{FetchHashReuse} = 1;
or
… = {};

Enables greatly optimized fetchrow_hashref() performance

if you promise to play by the rules!
$sql_new = $dbh->preparse($sql_orig, $return, $accept)

Gateway to extra functionality in the future, especially portability
– rewrite comments to suit database
– rewrite placeholders e.g. ‘?’ -> ‘%s’
– rewrite ODBC escape sequences
– (thanks to Scott Hildreth for kick-starting this work)

Definition of array binding API (e.g. for bulk loading)
– (thanks to Dean Arnold for doing much of this work)
.
76
Advanced DBI tutorial
© Tim Bunce
July 2001
What’s planned? (some other stuff)

$dbh->begin_work()


Turn AutoCommit off just until next commit() or rollback()
$h->swap_internal_handle($other_h)


Brain transplants for handles!
Opens the way for some cute features like
– lazy connect and lazy prepare
– reconnect to database but still use original handle


$dbh->selectall_hashref(...)
$sth->fetchall_hashref(...)


Return a ref to hash with one entry per row
Pick which column for key and how columns are stored in the value
– scalar, array ref, hash ref
77
Advanced DBI tutorial
© Tim Bunce
July 2001
And more? (sometime...)

Support PerlIO in Perl 5.6 and above

Faster database and statement handle creation

Get some life into the on-line DBI FAQ-o-matic system

Track bind values and make available via $sth->{BindValues}


include them in $h->{ShowErrorStatement} output
Rename finish() to cancel_select() !

only half joking, I’d leave finish as an alias
78
Advanced DBI tutorial
© Tim Bunce
July 2001
Reference Materials

http://dbi.perl.org/
– The DBI Home Page

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1_1997.tar.gz
– My DBI session at The Perl Conference 1.0 - general introduction

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2_1998.tar.gz
– My DBI session at The Perl Conference 2.0 - more depth

http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk5_2001.tar.gz
– This DBI tutorial at The Perl Conference 5.0 (based on the 3.0 & 4.0 tutorials).

http://www.oreilly.com/catalog/perldbi/
– or http://www.amazon.com/exec/obidos/ASIN/1565926994/dbi
– Programming the Perl DBI - The DBI book!
79
The end.
Till next year…
p.s. I’m also giving seminars
on the Perl Whirl cruise, January 2002
www.GeekCruises.com
Download