Perl/DBI - accessing databases from Perl Dr. Andrew C.R. Martin martin@biochem.ucl.ac.uk http://www.bioinf.org.uk/ Aims and objectives Understand the need to access databases from Perl Know why DBI? Understand the structure of DBI Be able to write a Perl/DBI script to read from or write to a database PRACTICAL: write a script to read from a database Why access a database from Perl? Why access a database from Perl? Send request for page to web server Pages Web browser Web server RDBMS CGI Script External Programs CGI can extract parameters sent with the page request Why access a database from Perl? Populating databases Need to pre-process and re-format data into SQL Intermediate storage during data processing Reading databases Good database design can lead to complex queries (wrappers) Need to extract data to process it Why use DBI? Why Perl/DBI? Many relational databases available. Commercial examples: Oracle DB/2 SQLServer Sybase Informix Interbase Open source examples: PostgreSQL mySQL Why Perl/DBI? Databases use a common query language: ‘structured query language’ (SQL) Queries can easily be ported between different database software Minor variations in more advanced features Proprietary extensions Why Perl/DBI? Can call command-line interface from within your program. $result = `psql -tqc “SELECT * FROM table” `; @tuples = split(/\n/, $result); foreach $tuple (@tuples) { @fields = split(/\|/, $tuple); } Inefficient: new process for each database access Why Perl/DBI? Databases generally provide own APIs to allow access from programming languages e.g. C, Java, Perl Proprietary APIs all differ Very difficult to port software between databases Standardized APIs have thus become available Why Perl/DBI? Perl/DBI is the standardized API for Perl Easy to port Perl scripts from one database to another DBI and ODBC ODBC (Open DataBase Connectivity) Consortium of vendors in early 1990s SQL Access Group October 1992 & 1993, draft standard: ‘Call Level Interface’ (‘CLI’ - an API) Never really adopted Microsoft ‘embraced and extended’ it to create ODBC DBI and ODBC ‘DBPerl’ designed as database interface specifically for Perl4 September 1992 (i.e. pre-ODBC) Just before release, Perl5 announced with OO facilities DBPerl modified to support OO and loosely modelled on CLI standard This became DBI DBI and ODBC ODBC Standard SQL syntax Standard error codes DBI Dodged this issue! Check $DBI::err or $DBI::errstr (DBI provides methods for standard errors, but drivers don’t use them) Many attributes and options to tweak underlying driver Very limited control Meta-data on database structure Tables and types only DBI and ODBC There is an ODBC driver for DBI DBI can be used to access any ODBC database The structure of DBI DBI Architecture DBI Oracle driver mySQL driver PostgreSQL driver Sybase driver DBI Architecture Multi-layer design Perl script DBI Database Independent DBD Database API RDBMS Database Dependent DBI Architecture Oracle DBD::Oracle DBD::Oracle Perl Perl Script Script Perl DBI Script DBD::mysql DBD::Oracle mySQL DBD::Pg DBD::Oracle PostgreSQL DBI Architecture @drivers = DBI->available_drivers(); Returns a list of installed (DBD) drivers DBI Architecture DBD::Oracle Driver Handle Database Handle Statement Handle DBD::pg Driver Handle Database Handle Database Handle Statement Handle Statement Handle Statement Handle Statement Handle Statement Handle Statement Handle DBI Architecture Driver Handles References loaded driver(s) One per driver Not normally referenced in programs Standard variable name $drh DBI Architecture Database Handles Created by connecting to a database $dbh = DBI->connect($datasource, ... ); References a database via a driver Can have many per database (e.g. accessing different user accounts) Can access multiple databases Standard variable name $dbh DBI Architecture Statement Handles Created by ‘preparing’ some SQL $sth = $dbh->prepare($sql); Can have many per database handle (e.g. multiple queries) Standard variable name $sth DBI Architecture DBD::pg Driver Handle Database Handle $dbh = DBI->connect($datasource, ... ); Statement Handle $sth = $dbh->prepare($sql); SQL Preparation Perl script prepare() DBI and Driver Pass statement to database engine Database Parse Statement if valid $sth execute() fetchrow_array() Encapsulate as DBI statement handle Pass statement to database engine Maintain cursor into results Encapsulate Statement Execute Statement Writing Perl/DBI scripts Accessing DBI from Perl Must have the DBI package and appropriate DBD package installed DBD::Oracle, DBD::Pg, DBD::mysql, etc. use DBI; Create a ‘handle’ to access the database: $dbh = DBI->connect($datasource, $username, $password); The DBD module and d/b to be used Data sources $dbh = DBI->connect($datasource, $username, $password); $dbname = “mydatabase”; $dbserver = “dbserver.cryst.bbk.ac.uk”; $dbport = 5432; $datasource = “dbi:Oracle:$dbname”; $datasource = “dbi:mysql:database=$dbname;host=$dbserver”; $datasource = “dbi:Pg:dbname=$dbname;host=$dbserver;port=$dbport”; Format varies with database module Optional; supported by some databases. Default: local machine and default port. Username and password $dbh = DBI->connect($datasource, $username, $password); $username and $password also optional Only needed if you normally need a username/password to connect to the database. Remember CGI scripts run as a special webserver user. Generally, ‘nobody’ or ‘apache’. Database must allow access by this user or specify a different username/password SQL commands with no return value SQL commands other that SELECT don’t return values may return success/failure flag number of entries in the database affected For example: creating a table inserting a row modifying a row SQL commands with no return value e.g. insert a row into a table: INSERT INTO idac VALUES (‘LYC_CHICK’, ‘P00698’) From Perl/DBI: $sql = “INSERT INTO idac VALUES (‘LYC_CHICK’, ‘P00698’)”; $dbh->do($sql); SQL commands that return a single row Sometimes, can guarantee that a database query will return only one row or you are only interested in the first row $sql = “SELECT * FROM idac WHERE ac = ‘P00698’”; @values = $dbh->selectrow_array($sql); Columns placed in an array Could also have been placed in a list: ($id, $ac) = $dbh->selectrow_array($sql); SQL commands that return a multiple rows Most SELECT statements will return many rows Three stages must be performed: preparing the SQL executing it extracting the results SQL commands that return a multiple rows $sql = “SELECT * FROM idac”; $sth = $dbh->prepare($sql); if($sth->execute) { while(($id, $ac) = $sth->fetchrow_array) { print “ID: $id AC: $ac\n”; } } (Can also obtain array or hash reference) NB: statement handle / fetchrow_array rather than db handle / selectrow_array SQL commands that return a multiple rows If you need to stop early you can do: $sql = “SELECT * FROM idac”; $sth = $dbh->prepare($sql); if($sth->execute) { for($i=0; $i<10; $i++) { if(($id, $ac) = $sth->fetchrow_array) { print “ID: $id AC: $ac\n”; } } $sth->finish; } SQL commands that return a multiple rows A utility method is also available to print a complete result set: $sql = “SELECT * FROM idac”; $sth = $dbh->prepare($sql); if($sth->execute) { $nrows = $sth->dump_results; } (Mostly useful for debugging) Repeated SQL calls Often want to repeat essentially the same query, but with some different value being checked. For example: foreach $ac (‘P00698’, ‘P00703’) { $sql = “SELECT * FROM idac WHERE ac = ‘$ac’”; @values = $dbh->selectrow_array($sql); print “@values\n”; } (using special option for 1-row returns) Repeated SQL calls Could also be do: foreach $ac (‘P00698’, ‘P00703’) { $sql = “SELECT * FROM idac WHERE ac = ‘$ac’”; $sth = $dbh->prepare($sql); $sth->execute; while(@values = $sth->fetchrow_array) { print “@values\n”; } } i.e. don’t use special option for 1-row returns $dbh->selectrow_array($sql) Repeated SQL calls Increase in performance by ‘binding’ a variable: $sql = “SELECT * FROM idac WHERE ac = ?”; $sth = $dbh->prepare($sql); foreach $ac (‘P00698’, ‘P00703’) { $sth->bind_param(1, $ac); $sth->execute; while(@values = $sth->fetchrow_array) { print “@values\n”; Parameter Variable } Number to bind } Repeated SQL calls NOTE: Performance increase depends on database and driver Although strings normally enclosed in single inverted commas, the bound variable is not quoted. If you have a number which you need to be treated as a string, then you do: $sth->bind_param(1, 42, SQL_VARCHAR); Summary DBI provides a standard API It does not standardize the SQL DBI is an older standard than ODBC They can be used together and they are both evolving Basic 3-step process: prepare / execute / fetch Shortcut calls for no return or 1-row return Many other functions available