Perl DBI Scripting with the ILS Roy Zimmer Western Michigan University

advertisement
Perl DBI Scripting with the ILS
Roy Zimmer
Western Michigan University
What do I need for database interactions with Perl?
What do I need for database interactions with Perl?
DBI – DataBase Interface (always)
What do I need for database interactions with Perl?
DBI – DataBase Interface (always)
DBD – DataBase Driver (one for each type of database
you must access)
What do I need for database interactions with Perl?
DBI – DataBase Interface (always)
DBD – DataBase Driver (one for each type of database
you must access)
a database? (some DBDs let you access non-database
data, such as CSV files)
What do I need for database interactions with Perl?
DBI – DataBase Interface (always)
DBD – DataBase Driver (one for each type of database
you must access)
a database? (some DBDs let you access non-database
data, such as CSV files)
some Perl proficiency
What do I need for database interactions with Perl?
DBI – DataBase Interface (always)
DBD – DataBase Driver (one for each type of database
you must access)
a database? (some DBDs let you access non-database
data, such as CSV files)
some Perl proficiency
But we only care about using our Voyager
database, so we’ll stick with that.
What does this mean to you?
What does this mean to you?
DBI – DataBase Interface (already on your Voyager
box)
What does this mean to you?
DBI – DataBase Interface (already on your Voyager
box)
DBD – DataBase Driver (for
your Voyager box)
, already on
What does this mean to you?
DBI – DataBase Interface (already on your Voyager
box)
DBD – DataBase Driver (for
your Voyager box)
, already on
a database? (already on your Voyager box)
What does this mean to you?
DBI – DataBase Interface (already on your Voyager
box)
DBD – DataBase Driver (for
your Voyager box)
, already on
a database? (already on your Voyager box)
some Perl proficiency (the only thing you supply!)
A simple program
Always needed for database access
nameletters.pl
A simple program
Get access information from a setup
file and connect to a database.
File format (the only record):
library.box.university.edu/username/password/VGER
nameletters.pl
A simple program
Connect to the database and get a handle.
nameletters.pl
A simple program…for each letter, how many patrons’
last names start with that letter
Create the query
sprintf is your friend
A sample query here is:
select count (*)
from wmichdb.patron
where last_name like ‘S%’
nameletters.pl
A simple program…for each letter, how many patrons’
last names start with that letter
Prepare the query, associating it with a
database, giving it a handle.
nameletters.pl
A simple program…for each letter, how many patrons’
last names start with that letter
Run the query, get a return code.
nameletters.pl
A simple program…for each letter, how many patrons’
last names start with that letter
Get the query result…
nameletters.pl
A simple program…for each letter, how many patrons’
last names start with that letter
…and print it.
nameletters.pl
A simple program…for each letter, how
last names start with that letter
Output
nameletters.pl
A: 003016
B: 007113
many C:
patrons’
005041
D: 003792
E: 001322
F: 002605
G: 003603
H: 005388
I: 000368
J: 001970
K: 004371
L: 003763
M: 007039
N: 001622
O: 001299
P: 003792
Q: 000121
R: 003770
S: 008217
T: 002528
U: 000248
V: 001791
W: 004487
X: 000018
Y: 000562
Z: 000709
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
retirees.pl
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
Outer query
Get patron information that meets the criteria, and connecting information for
barcodes.
(sprintf actually not needed here)
retirees.pl
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
Inner query
Remember the
outer query?
Note the positional correspondence of the
query parameters with the while statement
receiving a query row.
retirees.pl
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
Inner query
The inner query has a different
query string name…
retirees.pl
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
Inner query
…as do the other variables for this query.
retirees.pl
Nested query example…getting barcodes of retirees
who’ve borrowed something within the past year
Sample output:
Inner query
Doe, John R
doe1
pgroup: EMERIT/RET
21141002502810 08/16/1999 Other
99952803601000 11/25/2001 Expired
Doe, Josephine
doe2
21141001297289 08/16/1999 Other
21141002300660 08/16/1999 Other
11130755000000 09/04/2002 Expired
retirees.pl
pgroup: RETIREDSTF
Termination Notes
$sth->finish
finish a query
$dbh->disconnect
disconnect from a database
finish seems like a good practice, but it’s not really needed. Yet…
disconnect isn’t really needed (for our purposes, read-only). However,
without using the finish statement, you could run into error situations.
leading us to…
CPAN
the Comprehensive Perl Archive Network
Great resource for *all* kinds of Perl modules
Documentation for modules
- comprehensively so for DBI and DBD
- about 100 pages for DBI and 50 pages for DBD
A bit more documentation…
$sth->fetchrow_array
returns a single value, array, or list
There are a number of other ways to get query data, but this seems
to work best.
You can also get lots of details about your database environment via
DBI calls. See the DBI documentation for more information.
A quick little DBI utility
dbddrivers.pl
A quick little DBI utility
Output
Available Perl DBD drivers on this system
DBM
ExampleP
File
Gofer
Oracle
Proxy
Sponge
dbddrivers.pl
What if you need to look at every bib
record you’ve got?
What if you need to look at every bib
record you’ve got?
And you need to access the marc record
(synonymous with THE BLOB)?
Here’s one approach…
Traversing every record and dealing
with THE BLOB along the way
connect to the database
query bib_data table to get the maximum bib ID
set increment to 50,000
set ending bib ID to increment
set beginning bib ID to 0
Traversing every record and dealing
with THE BLOB along the way
connect to the database
query bib_data table to get the maximum bib ID
set increment to 50,000
set ending bib ID to increment
set beginning bib ID to 0
while beginning bid ID < maximum bib ID
chunkthroughdb()
provide feedback of progress
beginning bib ID = ending bib ID + 1
add increment to ending bib ID
end
provide final feedback
Traversing every record and dealing
with THE BLOB along the way
sub chunkthrudb
query bib_data for blob data
based on bib IDs >= beginning bib ID and
bib IDs < ending bib ID
for each record from the query
assemble each bib ID's data into a marc record
do the required processing for this record
end
end
Traversing every record and dealing
with THE BLOB along the way
Discussion of this approach
Alternative method: read the whole database at once!
- might be impossible
- might not be feasible
- probably not efficient
The program presented here…seems to be the better method.
Traversing every record and dealing
with THE BLOB along the way
Discussion of this approach
If nothing else, “chunking” your way through your database is more efficient
We have close to 1.6 million bib records. Based on the program we’re about to see,
traversing our database…
using 50,000 record chunks takes about 50 minutes
without “chunking” it takes about 76 minutes
Traversing every record and dealing with
THE BLOB along the way
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
Get our boundary condition
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
0-50,000 – our first chunk of records
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
the loop
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
the loop
set up the next chunk
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
- chunking subroutine (chunkthrudb)
This gets one chunk’s worth of MARC records (blob data)
“seqnum desc” is key to getting the
blob data for each record
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
- chunking subroutine (chunkthrudb)
Creates the MARC record for each bib
retrieved.
For larger records that don’t fit in one table
row, assemble the record in reverse order
(from the query).
findbadleader.pl
Traversing every record and dealing with
THE BLOB along the way
- chunking subroutine (chunkthrudb)
Process each record.
In this case, we’re looking for records with bad
leaders (not ending in “4500”).
Sample output:
findbadleader.pl
Being in a bind is a good thing…
sprintf is a friend, but bind is a better friend
Being in a bind is a good thing…
sprintf is a friend, but bind is a better friend
usual query method
Being in a bind is a good thing…
sprintf is a friend, but bind is a better friend
usual query method
query with bind values method
Illustrating and contrasting the query with bind method
create, prepare, and execute a query
testbind.pl
Illustrating and contrasting the query with bind method
prepare and execute a query
testbind.pl
Illustrating and contrasting the query with bind method
prepare and execute a query, improved
prepare moved
outside of the loop
testbind.pl
Illustrating and contrasting the query with bind method
Using testbind.pl for about 80,000
patron records, these are the results:
Method
Runtime
(seconds)
Create, prepare and
execute a query
20
Prepare and execute a
query (with bind)
11
Prepare and execute a
query, improved (with bind)
10
testbind.pl
Running reports, queries via Perl on your PC
ActiveState sells products and also has free versions. I recommend
their free Perl.
Get the MSI version.
It comes with DBI and DBD for Oracle, among many other modules
(and lots of documentation).
Still (!) requires external Oracle client software (at least as of version
5.10.0, build 1003)
Resources
The files listed below are available at
http://homepages.wmich.edu/~zimmer/files/eugm2008
dbi.ppt
this presentation
nameletters.pl
for every letter, find number of patrons whose last
names begin with that letter
retirees.pl
get barcodes of retirees who’ve borrowed something
in the last year
dbddrivers.pl
get list of DBD drivers on your system
findbadleader.pl find bib records with bad leaders (not ending in 4500)
testbind.pl
demonstrates varying query method efficiencies
Resources
CPAN
http://cpan.org
DBI
http://search.cpan.org/~timb/DBI-1.605/DBI.pm
DBD Oracle
http://search.cpan.org/~pythian/DBD-Oracle-1.21/Oracle.pm
Active State Perl
http://activestate.com/downloads/index.mhtml
There is also a book on Perl and DBI. I’d recommend
using it along with the most current documentation
from the CPAN sites above.
Thank you for listening.
Questions?
roy.zimmer@wmich.edu
Picture © 2008 by Roy Zimmer
Download