A Strand of Perls: Some Home Grown Utilities

advertisement
A Strand of Perls:
Some
Home Grown
Utilities
Syllabus
Our New Books List
Call Number Sorting
Getting Operator Profiles
QPID – Quick Patron
Information Dump (cupid…)
Our New Books List
Call Number Sorting
Getting Operator Profiles
QPID – Quick Patron
Information Dump (cupid…)
Why present another new books list?
Different strokes for different folks…
Given:
 professors don’t care about call numbers;
they just want to go to their area and see
what’s new – information sorted by
department
Why present another new books list?
Different strokes for different folks…
Given:
 professors don’t care about call numbers;
they just want to go to their area and see
what’s new – information sorted by
department
 information needed on a monthly basis
Why present another new books list?
Different strokes for different folks…
Given:
 professors don’t care about call numbers;
they just want to go to their area and see
what’s new – information sorted by
department
 information needed on a monthly basis
 can go back through data for several
previous months
Why present another new books list?
Different strokes for different folks…
Given:
 professors don’t care about call numbers;
they just want to go to their area and see
what’s new – information sorted by
department
 information needed on a monthly basis
 can go back through data for several
previous months
 here’s an overview…
New Books List Process at WMU
get last month’s
acquisitions
New Books List Process at WMU
get last month’s
acquisitions
break up by
department
New Books List Process at WMU
get last month’s
acquisitions
Department A
break up by
department
Department B
Department C
Department X
New Books List Process at WMU
get last month’s
acquisitions
Department A
break up by
department
Department B
Department C
one text output file
Department X
New Books List Process at WMU
get last month’s
acquisitions
Department A
break up by
department
Department B
Department C
one text output file
Department X
ftp to Batch
PC
New Books List Process at WMU
get last month’s
acquisitions
Department A
break up by
department
Department B
Department C
one text output file
Department X
ftp to Batch
PC
put on library LAN for the
Web Office
New Books List Process at WMU
Our production-type jobs get the
database password from a file, for
easy maintenance.
Then use DBI to set up access to the
database.
New Books List Process at WMU
The query (sprintf wrapper removed for clarity)
New Books List Process at WMU
Get data from the query in a
loop and put in an array
New Books List Process at WMU
Get rid of headphones!
New Books List Process at WMU
Create sort vector and put in array
New Books List Process at WMU
Got the deduping code from one
of the O’Reilly Perl books.
Data will implicitly be in
call number order due to sort
vector structure.
…line noise…?
Digression…
Speaking of line noise…
Broken up for clarity
This puts the line count of a MARC file
into a shell script.
New Books List Process at WMU
Now we need to get the results classified
by department, going by call number ranges.
raw ranges file…
New Books List Process at WMU
The call number range specifications
are normalized in the same manner used
for sorting.
New Books List Process at WMU
The call number range specifications
are normalized in the same manner used
for sorting.
Great for the computer, not so easy
for us humans.
Created a utility to make a
human-readable version.
formatted ranges file…
New Books List Process at WMU
The range data is read into arrays.
(If a syntactic error was found, the
program stops and shows where it is.)
New Books List Process at WMU
The range data is read into arrays.
(If a syntactic error was found, the
program stops and shows where it is.)
Then loop for each department. If the
current call number falls within the
current range, it goes into the current
department file.
New Books List Process at WMU
The output files are sorted.
For our final processing, we loop through
each of these sorted files of raw data. We
ignore the call number chunks created
during the normalization process.
The desired fields are concatenated and
line-wrapped.
field1 | field2 | etc.
how this was done…
New Books List Process at WMU
As we loop through the contents of each
departmental file:
We split up the sort vector, and store the
output fields with a vertical bar in between.
New Books List Process at WMU
Some additional processing is done,
including the always visually entertaining
regular expression manipulations.
New Books List Process at WMU
The output is line-wrapped prior to writing
to the file.
New Books List Process at WMU
The output is line-wrapped prior to writing
to the file.
You’ll need some initial setup for the above
wrap to work.
New Books List Process at WMU
Now we have our output file…
When we first implemented our list, this
was the whole process. The file was handed
off to the library, where staff separated
the departmental data out of the file,
added the HTML, and put it on our web site.
It took several hours to do this!
New Books List Process at WMU
Once I knew this, I looked into further
automation. Now we have an additional Perl
script that takes care of the rest of the
story.
I looked at the new books’ web pages the
library had created and figured out that I
could break out three sections of static
html.
New Books List Process at WMU
We read the previous output file, paying
attention to which department we’re “in”.
New Books List Process at WMU
We read the previous output file, paying
attention to which department we’re “in”.
Next, we create a separate .html file for
each department, incorporating the static
HTML sections, adding date information
where necessary.
New Books List Process at WMU
We read the previous output file, paying
attention to which department we’re “in”.
Next, we create a separate .html file for
each department, incorporating the static
HTML sections, adding date information
where necessary.
Finally, these files are put on the library
LAN and a reminder email is sent out.
New Books List Process at WMU
get last month’s
acquisitions
static HTML
Department A
break up by
department
static HTML
Department B
static HTML
Department C
static HTML
Department X
separate HTML file for each
department, ready to be incorporated
into the library web pages
ftp to Batch
PC
put on library LAN for the
Web Office
New Books List Process at WMU
See the results at:
http://www.wmich.edu/library/newbooks/index.html
Our New Books List
Call Number Sorting
Getting Operator Profiles
QPID – Quick Patron
Information Dump (cupid…)
Call Number Sorting
Seems right to call it sorting, but it’s
really in the normalization process that
the “magic” occurs.
Call Number Sorting
Seems right to call it sorting, but it’s
really in the normalization process that
the “magic” occurs.
Uses intelligent parsing, not a quick
regular expression implementation.
Call Number Sorting
Seems right to call it sorting, but it’s
really in the normalization process that
the “magic” occurs.
Uses intelligent parsing, not a quick
regular expression implementation.
Designed with LC call numbers in mind, but
pretty much handles everything, including
locally generated call numbers.
Call Number Sorting
Seems right to call it sorting, but it’s
really in the normalization process that
the “magic” occurs.
Uses intelligent parsing, not a quick
regular expression implementation.
Designed with LC call numbers in mind, but
pretty much handles everything, including
locally generated call numbers.
Resulting sorts appear to be about 99%
accurate (my estimate).
Call Number Sorting
Seems right to call it sorting, but it’s
really in the normalization process that
the “magic” occurs.
Uses intelligent parsing, not a quick
regular expression implementation.
Designed with LC call numbers in mind, but
pretty much handles everything, including
locally generated call numbers.
Resulting sorts appear to be about 99%
accurate (my estimate).
The algorithm divides call numbers into
chunks, based on separators.
Call Number Sorting
Explicit separators:
colon
(:)
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
period
(.)
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
period
(.)
space
( )
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
period
(.)
space
( )
forward slash (/)
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
period
(.)
space
( )
forward slash (/)
Implicit separators:
transitions:
alpha->numeric
numeric->alpha
Call Number Sorting
Explicit separators:
colon
(:)
semicolon
(;)
comma
(,)
period
(.)
space
( )
forward slash (/)
Implicit separators:
transitions:
alpha->numeric
numeric->alpha
During parsing, separators are absorbed, but
the period may be uniquely retained.
Call Number Sorting
Further processing and normalization include:
Whole numbers are treated differently
from decimal numbers.
Call Number Sorting
Further processing and normalization include:
Whole numbers are treated differently
from decimal numbers.
Decimal numbers may affect as many as
several following chunks.
Call Number Sorting
Further processing and normalization include:
Whole numbers are treated differently
from decimal numbers.
Decimal numbers may affect as many as
several following chunks.
Look-ahead and look-back for one or
more chunks is also employed.
Call Number Sorting
demo…
democall.pl democall.lst
Call Number Sorting
This code is available at:
http://homepages.wmich.edu/~zimmer
Our New Books List
Call Number Sorting
Getting Operator Profiles
QPID – Quick Patron
Information Dump (cupid…)
preface
The next two programs are designed to run on
PCs, not on a Voyager box.
In order to run them, you will need:
Perl, (I use ActiveState)
DBI and DBD for Oracle (get from ActiveState)
Oracle Client software (from Oracle)
preface
Get ActiveState Perl at:
http://www.activestate.com/Products/Download/Reg
ister.plex?id=ActivePerl
This puts you at the registration (optional)
screen for the download. At the next page,
you’ll probably want to select the “MSI”
installation for Windows.
Get version 5.6.1.
preface
How to get DBI and DBD:
Once ActivePerl is installed, open a command
prompt window (DOS prompt)
Run PPM
Once in PPM, install DBI and DBD
Exit PPM
preface
Oracle Client software
Required! DBI and DBD rely on this.
Check to make sure that the Oracle
licensing arrangement at your site allows
you to install the client software, if you
do not already have a suitably equipped PC
available.
I used 8.1.6.
The stated combination of versions is the
only one I got to work. This is on machines
running Windows 2000.
Getting Operator Profiles
Demo getprof…
Getting Operator Profiles
Program outline:
Setup and initialization
Look for each possible profile; when found:
get count of affected locations
get profile info
format Y/N boolean values
output info to file in HTML format
Invoke browser to display profile data
Getting Operator Profiles
Query example using the master profile
Getting Operator Profiles
Do some data
massaging,
then use tables in HTML for
formatting.
Our New Books List
Call Number Sorting
Getting Operator Profiles
QPID – Quick Patron
Information Dump (cupid…)
Patron Information Dump
Demo patdump…
Patron Information Dump
Program outline:
Setup and initialization.
Run a series of queries to get all the
patron data.
Take resulting data from each query and
format in html, creating a file.
Invoke browser to display patron data.
Both the operator profile and patron dump
programs allow for choice of browser, via an
.ini file, so that different users can use
different browsers.
Sample .ini file
Some users like to
explore…
.bat
.ini
normal file
Slow down or stop them
by associating an icon
with the .bat file and
hiding the other files.
hidden, read-only
.pl
hidden, read-only
.html
normal file
your choice of
browser
The code for these two programs is
available at:
http://homepages.wmich.edu/~zimmer
Resources
http://www.tek-tips.com/gfaqs.cfm/pid/219/fid/1711
for general installation issues on a PC
http://metalink.oracle.com
login, then search for 131299.1 for Pentium IV
problems with the client software install
http://www.activestate.com/Products/Download/Register
.plex?id=ActivePerl
get your ActivePerl here
http://www.wmich.edu/library/newbooks/index.html
Western Michigan University new books list
http://homepages.wmich.edu/~zimmer
some of the code from this presentation
Thanks for listening.
Questions?
Email: zimmer@wmich.edu
Phone: 269.387.3885
Download