You Can Do It! Start Using Perl to Handle Your Voyager Needs.

advertisement
You Can Do It!
Start Using Perl to
Handle Your Voyager
Needs.
Some Perl nomenclature
PERL – Practical Extraction and Report
Language
(camel by O’Reilly)
Some Perl nomenclature
PERL – Practical Extraction and Report
Language
PERL – Pathologically Eclectic Rubbish Lister
(not really)
(camel by O’Reilly)
Some Perl nomenclature
PERL – Practical Extraction and Report
Language
PERL – Pathologically Eclectic Rubbish Lister
(not really)
TMTOWTDI – There’s More Than One Way To
Do It
(camel by O’Reilly)
Some Perl attributes
it’s a scripted language, not compiled faster, easier development
runs plenty fast for most things
Some Perl attributes
it’s a scripted language, not compiled faster, easier development
runs plenty fast for most things
Loose variable typing both good and bad,
but mostly good
Your first program
#!/usr/local/bin/perl
print "Hello, World\n";
“Protecting” your program
(Unix)
By default, your program is
not executable.
chmod 744 your_program
You can execute it as owner
of the file, anyone else can
only read it.
Variables
$name
can be text or number:
a character,
a whole page of text,
or any kind of number
context determines type
can go “both” ways
Variables, array of
@employee
Array of $employee variables
$employee[0]
$employee[1]
etc.
Variables, hash of
$lib{‘thisone’} = “2 days”;
$lib{‘thatone’} = “5 days”;
Thus can use
$grace_period = $lib{$libname}
when $libname is thatone,
$grace_period is 5 days
Variables, list of
($var1, $var2, $var3) =
function_that_does_something;
This function returns a list of elements.
A list is always inside parentheses ().
Variables, assigning a value to
$var = value or expression
$array[n] = something;
@array = (); # empty array
%hash = ();
# empty hash
Can be done almost anywhere, anytime.
Variable scope, and good practices
use strict;
Requires that you declare all
variables like this:
my $var;
my $var = something;
my @array = ();
Also makes Perl check your code.
Best Practices!
Variable scope, and good practices
use strict;
my $var;
my $var = something;
my @array = ();
A variable declared like this is
visible throughout your program.
Best Practices!
Variable scope, and good practices
use strict;
my $var;
my $var = something;
my @array = ();
A “my” declaration within code grouped
within { and } is visible only in that section
of code; it does not exist elsewhere.
Best Practices!
Scope: where in a program a variable exists.
some
Special Variables
$_
default parameter for many functions
$.
current record/line number in current file
$/
input record separator (usually the newline
character)
$,
print() function output separator (normally an
empty string)
$0
name of the Perl script being executed
$^T time, in seconds, when the script begins running
$^X full pathname of the Perl interpreter running the
current script
some
Special Variables
@ARGV
array which contains the list of the command
line arguments
@Inc
array which contains the list of directories
where Perl can look for scripts to execute
(for use DBI and other modules)
%ENV
hash variable which contains entries for your
current environment variables
some
Special Variables
STDIN
read from the standard input file handle
(normally the keyboard)
STDOUT
send output to the standard output file handle
(normally the display)
STDERR
send error output to the standard error file
handle (normally the display)
DATA
file handle referring to any data following
__END__
and dozens more…
String manipulation & other stuff
Given
$stuff = “this is me”;
These are not equivalent:
“print $stuff”
‘print $stuff’
`print $stuff`
String manipulation & other stuff
Given
$stuff = “this is me”;
These are not equivalent:
“print $stuff” is “print this is me”
‘print $stuff’
`print $stuff`
String manipulation & other stuff
Given
$stuff = “this is me”;
These are not equivalent:
“print $stuff” is “print this is me”
‘print $stuff’ is ‘print $stuff’
`print $stuff`
String manipulation & other stuff
Given
$stuff = “this is me”;
`print $stuff` would have the
operating system try to execute the
command <print this is me>
String manipulation & other stuff
This form should be used as
$something = `O.S. command`
Example: $listing = ‘ls *.pl`;
The output of this ls command is
placed, as possibly a large string, into
the variable $listing. This syntax allows
powerful processing capabilities within a
program.
printf, sprintf
printf(“%s lines here”, $counter)
if $counter is 42, we get
42 lines here
for the output
printf, sprintf
printf(“%c lines here”, $counter)
if $counter is 42, we get
* lines here
for the output, since 42 is the ASCII
value for “*”, and we’re printing a
character
printf, sprintf
Some additional string formatting…
%s – output length is length($var)
%10s – output length is absolutely 10
(right justified)
%10.20s – output length is min 10,
max 20
%-10.10s – output length is absolutely 10
(left justified)
Any padding is with space characters.
printf, sprintf
Some additional number formatting…
%d – output length is length($var)
%10d – output length is absolutely 10
(leading space padded)
%-10d – left justified, absolutely 10
(trailing space padded)
%-10.10d – right justified, absolutely 10
(leading zero padded)
printf, sprintf
Still more number formatting…
%f – output length is length($var)
%10.10f – guarantees 10 positions to the
right of the decimal (zero padded)
printf, sprintf
printf whatever outputs to the screen
printf, sprintf
printf whatever outputs to the screen
printf file whatever outputs to that file
Ex: printf file (“this is %s fun\n”, $much);
(print functions just like the above, as to
output destination.)
printf, sprintf
printf whatever outputs to the screen
printf file whatever outputs to that file
Ex: printf file (“this is %s fun\n”, $much);
(print functions just like the above, as to
output destination.)
sprintf is just like any printf, except that
its output always goes to a string
variable.
Ex: $var = sprintf(“this is %s fun\n”, $much);
Some other common functions
substr
get a portion of a string
index
get the location of a string in a string
length
get the length of a string
ord, char
convert a character to its ASCII value
and vice versa
$var = ƒ(x);
Some other common functions
uc, lc
convert a string entirely to upper or
lower case
ucfirst,
lcfirst
convert the first character of a string to
upper or lower case
$var = ƒ(x);
Some other common functions
split convert a string into pieces based on a
supplied character
join
convert a list of strings into one string, joined
by a supplied character
$var = ƒ(x);
Loop stuff
foreach, with an array
@person contains a large number of people
foreach $individual (@person)
{
print “this is person $individual\n”;
}
no subscript required!
cleaner code
Loop stuff
while, with an array
@person contains a large number of people
$idnum = 0;
while ($idnum < @person)
{
print “this is person $person[$idnum]\n”;
$idnum++;
}
not as clean as using foreach,
but sometimes this makes more sense
Loop stuff
for, with an array
(backwards traversal)
@person contains a large number of people
for ($idnum=scalar(@person); $idnum--; $idnum>=0)
{
print “this is person $person[$idnum]\n”;
}
conventional for loop
Loop stuff,
more control
@person contains a large number of people
for ($idnum=scalar(@person); $idnum--; $idnum>=0)
{
next if ($person[$idnum] eq “Harry”)
print “this is person $person[$idnum]\n”;
}
skip anybody named Harry
Loop stuff,
more control
@person contains a large number of people
for ($idnum=scalar(@person); $idnum--; $idnum>=0)
{
last if ($person[$idnum] eq “Penelope”)
print “this is person $person[$idnum]\n”;
}
next_program_line;
once we get to Penelope, leave the loop, and
resume execution at next_program_line
One last bit of array stuff…
@person = ();
…
while (“reading a file”) # this line is not real code!
{
$name = substr($file_line, 0, 30);
push @person, $name;
}
populate an array simply,
no hassle with an index variable
File input and output (I/O)
“slurping” a file
File test operators
Here are a few:
-d
tests if the file is a directory
-e
tests if the file exists
-s
returns the size of the file in bytes
-x
tests if the file can be executed
Example:
$filesize = -s $file
Date and Time in Perl, basic
### "create" today's date
my ($sec, $min, $hour,
$day, $month, $year,
$wday, $yday, $isdst) = localtime;
This gets the date and time information
from the system.
Date and Time in Perl, basic
### "create" today's date
my ($sec, $min, $hour,
$day, $month, $year,
$wday, $yday, $isdst) = localtime;
my $today =
sprintf ("%4.4d.%2.2d.%2.2d",
$year+1900, $month+1, $day);
This puts today’s date in “Voyager”
format, 2006.04.26
Date and Time in Perl
The program, datemath.pl, is part of your
handout. The screenshot below shows its
output.
Regular expressions, matching
m/PATTERN/gi
If the m for matching is not there, it is
assumed.
The g modifier means to find globally, all
occurrences.
The i modifier means matching case
insensitive.
Modifiers are optional; others are
available.
Regular expressions, substituting
s/PATTERN/REPLACEWITH/gi
The s says that substitution is the intent.
The g modifier means to substitute
globally, all occurrences.
The i modifier means matching case
insensitive.
Modifiers are optional; others are
available.
Regular expressions, transliterating
tr/SEARCHFOR/REPLACEWITH/cd
The tr says that transliteration is the
intent.
The c modifier means transliterate
whatever is not in SEARCHFOR.
The d modifier means to delete found but
unreplaced characters.
Modifiers are optional; others are
available.
Regular expressions
# if the pattern matches
if ($var =~ /regular expression/)
{
make_something_happen;
}
Regular expressions
# if the pattern does NOT match
if ($var !~ /regular expression/)
{
make_something_happen;
}
Regular expressions
# contents of $var will be changed
# 1st occurrence of this changes to that
$var =~ s/this/that/;
# all occurrences of this are changed to that
$var =~ s/this/that/g;
Regular expressions
# contents of $var will be changed
# converts all lower case letters to
# upper case letters
$var =~ tr/a-z/A-Z/;
Regular expressions
Some simple stuff to get started…
m/thisx*/
* find zero or more ‘x’ right after ‘this’
m/thisx+/
+ find one or more ‘x’ right after ‘this’
m/thisx?/
? find zero or one ‘x’ right after ‘this’
Regular expressions
Some simple stuff to get started…
m/thisx*/
* find zero or more ‘x’ right after ‘this’
m/thisx+/
+ find one or more ‘x’ right after ‘this’
m/thisx?/
? find zero or one ‘x’ right after ‘this’
m/[0-9]{5}/
find exactly five consecutive digits
m/[0-9]{5,}/
find at least five consecutive digits
m/[0-9]{5,7}/
find from five to seven consecutive digits
Regular expressions
Some more simple stuff…
m/^this/
find ‘this’ only at the beginning of the string
m/this$/
find ‘this’ only at the end of the string
Regular expressions
Some more simple stuff…
m/^this/
find ‘this’ only at the beginning of the string
m/this$/
find ‘this’ only at the end of the string
Some specific characters:
\n
newline (line feed)
\r
carriage return
\t
tab
\f
form feed
\0
null
Regular expressions
Some more simple stuff…
m/^this/
find ‘this’ only at the beginning of the string
m/this$/
find ‘this’ only at the end of the string
Some specific characters: Some generic characters:
\n
newline (line feed) \d
any digit
\r
carriage return
\D
any non-digit character
\t
tab
\s
any whitespace character
\f
form feed
\S
any non-whitespace character
\0
null
Regular expressions
Look in the Perl book (see Resources) for an
explanation on how to use regular
expressions. You can look around elsewhere,
at Perl sites, and in other books, for more
information and examples.
Looking at explained examples can be very
helpful in learning how to use regular
expressions.
(I’ve enclosed some I’ve found useful; see
Resources.)
Regular expressions
Very powerful mechanism.
Often hard to understand at first glance.
Can be rather obtuse and frustrating!
If one way doesn’t work, keep at it. Most
likely there is a way that works!
DBI stuff
What is it and why might I want it?
DBI is the DataBase Interface module for
Perl. You will also need the specific DBD
(DataBase Driver) module for Oracle.
This enables Perl to perform queries
against your Voyager database.
Both of these should already be on your
Voyager box.
DBI stuff, how to
You need four things to connect to Voyager:
machine name your.machine.here.edu
username
your_username
password
your_password
SID
VGER (or LIBR)
DBI stuff, how to
$dbh is the handle for the database
$sth is the handle for the query
Create a query…then execute it.
NOTE: SQL from Access will most
likely NOT work here!
DBI stuff, how to
Get the data coming from your query.
DBI stuff, how to
Get the data coming from your query.
You’ll need a Perl variable for each column
returned in the query.
Commonly a list of variables is used; you
could also use an array.
DBI stuff, how to
Get the data coming from your query.
You’ll need a Perl variable for each column
returned in the query.
Commonly a list of variables is used; you
could also use an array.
Typically, you get your data in a while loop,
but you could have
$var = $sth->fetchrow_array;
when you know you’re getting a single value.
DBI stuff, how to
When you’re done with a query, you should
finish it. This becomes important when you
have multiple queries in succession.
You can have multiple queries open at the
same time. In that case, make the statement
handles unique…$sth2, or $sth_patron.
Finally, you can close your database
connection.
CPAN
Comprehensive Perl Archive Network
http://cpan.org
You name it and somebody has probably written
a Perl module for it, and you’ll find it here.
There are also good Perl links here; look for the
Perl Bookmarks link.
CPAN
Installing modules
You need to be root for systemwide installation
on Unix systems.
On Windows machines, you’ll probably need to
be administrator.
You can install them “just for yourself” with a bit
of tweaking, and without needing root access.
If you’re not a techie, you’ll probably want to
find someone who is, to install modules.
Installing modules from CPAN is beyond the
scope of this presentation.
Perl on your PC
You can get Perl for your PC from ActiveState.
They typically have two versions available; I
recommend the newer one. Get the MSI version.
Installation is easy and painless, but it may take
some time to complete.
A lot of modules are included with this
distribution; many additional modules are
available. Module installation is made easy via
the Perl Package Manager (PPM).
Perl on your PC
To use ppm in ActiveState Perl, open a command
prompt window and enter ppm.
Help is available by simply typing help.
Some useful commands in ppm are:
query *
show what’s already installed
search pkg
look for package pkg at
ActiveState’s repository
install pkg
retrieve and install package
pkg on your machine
Perl on your PC
If you can’t find the module you’re looking for at
ActiveState, you should be able to find it at
CPAN, and will have to install it manually.
Voyager examples
Based on my experience (your mileage may vary),
there are two main types of applications, for
Voyager:
Voyager examples
Based on my experience (your mileage may vary),
there are two main types of applications, for
Voyager:
reports, or data retrievals, from the database
Voyager examples
Based on my experience (your mileage may vary),
there are two main types of applications, for
Voyager:
reports, or data retrievals, from the database
data manipulation, mainly of files to be imported
Voyager example,
a simple report
This report finds patrons with
multiple email addresses
Voyager example,
a simple report
Tells the system where to find Perl
Voyager example,
a simple report
Will be querying the Voyager database
Voyager example,
a simple report
Set up output file name
Voyager example,
a simple report
Carefully open the output file for use
Voyager example,
a simple report
Keep password data in ONE file. Why?
one point of maintenance (less work when the password changes)
reduces opportunities for error
anyone can see the source code without seeing the password data
Voyager example,
a simple report
Get some information for each patron
Voyager example,
a simple report
Get the patron identifying data in a loop, and…
Voyager example,
a simple report
Get the patron identifying data in a loop, and
set up the query to get the email address(es) for this patron
Voyager example,
a simple report
In an “inner” loop, get email address data for this patron
Voyager example,
a simple report
In an “inner” loop, get email address data for this patron.
Preformat the fields for future output.
Voyager example,
a simple report
In an “inner” loop, get email address data for this patron.
Preformat the fields for future output.
Populate the address array with each address for this patron.
(note that this array starts out empty for each patron, see previous
slide)
Voyager example,
a simple report
If this patron has more than one email address, then we are interested
Voyager example,
a simple report
Remove trailing spaces from the name parts,
then concatenate the parts together
Voyager example,
a simple report
Now output the multiple email addresses for this patron
Voyager example,
a simple report
A sample of the output
Voyager example,
some data manipulation
This program processes incoming authority records:
remove records whose 010 |a fields begin with "sj“
remaining records are stripped of the 9xx fields
Voyager example,
some data manipulation
Specify the file to be processed as a command line parameter.
If no parameter is supplied, display a short paragraph that
shows how to use this program, then exit.
Voyager example,
some data manipulation
Set up the |a subfield “delimiter”.
This will be used later in the 010 field.
Voyager example,
some data manipulation
We could have used $ARGV[0] as the filename variable, but
using $marcin makes the program more readable
Voyager example,
some data manipulation
An example of “slurping”, reading the file into an array without
resorting to a loop
Voyager example,
some data manipulation
This an example of early code sticking around too long. It should
be rewritten:
Insert this line before accessing the file:
$/ = chr(0x1d); # use the MARC end-of-record terminator
Then get the data this way:
@marcrecords = <marcin>;
The above code can be eliminated by these simple changes.
Voyager example,
some data manipulation
This an example of early code sticking around too long. It should
be rewritten:
end
result isthe
that
we have an array
Insert this lineThe
before
accessing
file:
of the MARC
records
fromend-of-record
the input fileterminator
$/ = chr(0x1d);
# use
the MARC
Then get the data this way:
@marcrecords = <marcin>;
The above code can be eliminated by these simple changes.
Voyager example,
some data manipulation
Voyager example,
some data manipulation
Determine the base address for data in this record,
and get ready to read the directory
Voyager example,
some data manipulation
Get each field’s particulars, figure out where its data is,
and read the data
Voyager example,
some data manipulation
We look for field 010, subfield a
Voyager example,
some data manipulation
If subfield a is found, does its data start with “sj”? If so, we do
not want this record.
Voyager example,
some data manipulation
Looks like this record is a keeper.
If this is a 9xx field, i.e., the tag id starts with ‘9’, keep track
of these fields in an array until we’ve looked at all the fields.
Voyager example,
some data manipulation
When done reading the record that’s a keeper, we need to
delete the 9xx fields, and output the record.
Voyager example,
some data manipulation
If the record is not a keeper, put it in the deleted file
Resources
Learning Perl
Perl in a Nutshell
I use these two a lot
Programming Perl
Perl Cookbook
All books are from O’Reilly.
Resources
Perl Best Practices
Perl Hacks
Intermediate Perl
Advanced Perl Programming
All books are from O’Reilly.
These will
start to be
useful once
you have
some Perl
experience.
Resources
Active State Perl
http://activestate.com/Products/Download/Download.plex?id=ActivePerl
CPAN
http://cpan.org
a great link to links
http://www.thepeoplestoolbox.com/programmers/perl
Resources
The files listed below are available at
http://homepages.wmich.edu/~zimmer/files/eugm2007
youcandoitPerl.ppt
this presentation
findmanyemail.pl
find patrons with multiple email addresses
(available by request)
noauthsj.pl
delete record if 010 |a starts with “sj”, and strip 9XX
fields from remaining records
datemath.pl
some program code for math with dates
snippet.grep
various regular expressions I’ve found useful
Thanks for listening.
Thanks for listening.
Questions?
Questions?
roy.zimmer@wmich.edu
roy.zimmer@wmich.edu
269.387.3885
269.387.3885
Picture
Picture
© 2006
© 2005
by Roy
by Zimmer
Roy Zimmer
Download