Management Information and the Digital Library

advertisement
Through the Bytes Darkly
Through the Bytes Darkly,
Management Information and the Digital Library
Information Technology Interest Group
ACRL, New England Chapter
Joe Zucca
Assessment, Planning and Publications Librarian
University of Pennsylvania Library
Through the Bytes Darkly
Four Sections of This Presentation:
1. Environmental Audit: Key Factors That Influence Our Ability to
Measure Digital Information Use
2. From Low Resolution to High Resolution Data: Mining the Server
Logs
3. The Data Farm Experiment: Tools That Serve Access Can Also
Serve Measurement
4. Why the Data Are Important
Through the Bytes Darkly
Measuring Electronic Use at Penn: Environmental Influences
1. Organization and Culture
Strategic Focus
Base planning, goal setting/assessment on empirical evidence. From 1996an element of Penn’s Strategic Plan
Operational Imperatives
1) Make evaluation and measurement a component of each program and
project
2) Construct relays that feed data to people who need quantitative information
to strategize and manage
Experimental Attitude
Leverage the data you have; usually they’re “good enough” to validate
organizational experience and knowledge
Through the Bytes Darkly
Measuring Electronic Use at Penn: Environmental Influences
2. Proliferation of Electronic Resources
Article indexes, e-journals and other full-text resources
7000
6875
6000
Number of Titles
5000
4932
4000
3000
2438
2000
1394
1000
0
3
1991
9
1993
86
1996
1999
2000
2001
2002
Through the Bytes Darkly
Measuring Electronic Use at Penn: Environmental Influences
2.1. Growth of Expenditures for Electronic Resources
Annual Growth of Expenditures for Electronic Information Based on 1991
1000%
900%
PCT Increase in Expenditure
800%
700%
600%
500%
400%
300%
200%
100%
0%
1991
1993
1996
1999
2000
2001
E-Resources as a percent of acquisitions budget
1991
3.7%
1993
3.2%
1996
5.5%
1999
13.2%
2000
13.9%
2001
15.7%
Through the Bytes Darkly
Measuring Electronic Use at Penn: Environmental Influences
3. Technology’s Hostility to Measurement
 Volatile metrics (“The new system doesn’t count that way!”)
 Ever-changing data elements (“sets are out “searches” are in)
 No common metrics (log-ins, sessions, searches, browses, page hits…)
 No measurement standards (What’s a “search”?, What’s a Web “session”?)
 Non existent or inaccessible data (the vendor problem)
 Approximate & hard to obtain statistics (lots of data, no information)
 Fleeting benchmarks
Through the Bytes Darkly
From Low Resolution to High Resolution Data:
Mining the Server Logs for Descriptive Statistics
dial-123-130.dial. indiana.edu - - [04/ Feb/2001 :00:18:02 -0500] "GET
/special/ photos/ theater/504.html HTTP/1.0" 200 3247
"http://www.library.upenn. edu /special/photos/ theater /503.html"
"Mozilla/4.7 C-CCK MCD {C-UDP; EBM-APPLE} (Macintosh; I; PPC)” dialin1085.
upenn.edu--[04/Feb/ 2001:00:18: 04 -0500]"GET/facilities/count_
use.html?resource =China%20Economic%20 Review& method= ejs& url=
http://www.sciencedirect.com/ science/journal/ 1043951XHT TP/1.0" 200 2027
"http:// www.library.upenn.edu/webbin 5/ resources/ejspubl
ic5.cgi?homepage=http:// www. library.upenn.edu/lipp incott/&community=
Business" "Mozilla/ 4.0 (compatible; MSIE 5.0; Windows 98; DigExt; SPIKE
5)” 203.197. 226.240 - - [04/Feb/2001:00:18:07 -0500] "GET
/etext/sasia/aiis/ architecture/khajuraho/ 010a.jpg HTTP/1.0" 200 89117
"http://www.library.upenn.edu/etext/sasia/ aiis/arch
itecture/khajuraho/010.html" "Mozilla/4.7 [en] (Win95; I)”
Through the Bytes Darkly
Low Resolution
Inputs
Records in locally-managed databases (including the OPAC)………………………26,332,138
Number of journal article indexes & full-text files (e.g. Academic Index)…….……………...267
Number of e-journals (from publishers such as Elsevier and free sources)…..…………..6,608
Number of digital books (locally created, aggregated and licensed)…….……………...110,000
Number of locally digitized and accessible images (e.g. fine art slides, ms facsimiles)..82,356
Number of records in the OPAC ……………………………….....……………………...2,879,696
Number of pages, forms and directories constituting the library web site……………….32,000
Through the Bytes Darkly
Low Resolution
The Load on Our Machines
Web Pages Served 1995-2001 from www.library.upenn.edu. 3-month moving average
2,000,000
1,800,000
Total pages requested:
FY 96 2,481,146
FY 97 5,316,283 one-year increase................114.3%
FY 98 7,038,872
...................32.4%
FY 99 11,807,289
...................67.7%
FY00 12,540,531
.....................6.2%
FY01 14,461,712
....................15.3%
1,600,000
1,200,000
1,000,000
800,000
600,000
400,000
200,000
Apr-01
Jan-01
Oct-00
Jul-00
Apr-00
Jan-00
Oct-99
Jul-99
Apr-99
Jan-99
Oct-98
Jul-98
Apr-98
Jan-98
Oct-97
Jul-97
Apr-97
Jan-97
Oct-96
Jul-96
Apr-96
Jan-96
Oct-95
0
Jul-95
Web Pages Requested per Month
1,400,000
Through the Bytes Darkly
Low Resolution
Changing Machine Demand
BlackBoard
Pages Served by the Main Library Web Server + OPAC Server
25,000,000
OPAC
Web
20,000,000
15,000,000
10,000,000
5,000,000
0
1996
1997
1998
1999
2000
2001
2002
Projected
Through the Bytes Darkly
Low Resolution
Search Activity Over Time
Annual Searches in Licensed Databases (e.g., MEDLINE), FY97-01
3,500,000
3,000,000
searches
2,500,000
2,000,000
1,500,000
1,000,000
500,000
0
fy97
fy99
fy00
fy01
Through the Bytes Darkly
Correlation Matrix of Use Metrics Available for Ovid Files
Pearson r for Sessions, Connect Time, Sets, Documents Viewed
99 cases
Sessions
Sets
Sessions
1.00
Time
.980
1.00
Sets
.905
.971
1.00
Documents Viewed
.844
.932
.983
5000
sets
4000
3000
2000
Docs.Viewed
1.00
80000
80000
70000
70000
60000
60000
50000
50000
sets
6000
time
Time
40000
40000
30000
30000
20000
20000
1000
10000
10000
0
0
0
4000
8000
sessions
12000
0
0
1000
2000
3000
time
4000
5000
0
4000
8000
sessions
12000
Through the Bytes Darkly
Correlation Matrix of Use Metrics Available for SilverPlatter Files
Pearson r for Sessions, Connect Time, Searches, Documents Viewed
Sessions
Time
Searches
Sessions
1.00
Time
.975
1.00
Searches
.899
.901
1.00
Abstracts Viewed
.840
.870
.855
Abs. Viewed
1.00
94 cases
120
Searches
Time
100
80
60
40
20
0
0
100
200
300
Sessions
400
500
1800
1600
1400
1200
1000
800
600
400
200
0
Searches
140
0
20
40
60
80
Time
100
120
140
1800
1600
1400
1200
1000
800
600
400
200
0
0
100
200
300
Sessions
400
500
Through the Bytes Darkly
High Resolution Data + User Input + Good Program Liaison and Knowledge
Support Resource Management, and Inform Basic Questions, e.g.:
 Are
we choosing the right information sources for our audiences?

…optimizing the delivery of electronic information?

…making access as easy and seamless as possible?

…spending our dollars wisely?

…able to detect and respond to change in the patterns of resource use?
Through the Bytes Darkly
Using the Architecture of the Web to Increase Data Resolution
www.library.upenn.edu/facilities/count_use.html
Through the Bytes Darkly
Beginning with a stream of unprocessed log data...
dial-123-130.dial. indiana.edu - - [04/Feb/2001:00:17:38-0500] "GET/special/photos /theater/505.html HTTP/1.0" 200 3086
"http://www.library. upenn.edu/special/photos/theater/504.html" "Mozilla/4.7C-CCK-MCD {C-UDP; EBM-APPLE}
(Macintosh; I; PPC)” recrawler 1.bos2.fastsearch.net - -[04/Feb/2001:00: 18:21- 0500] "GET /etext/ sasia/skt-mss/1549
/15a.html HTTP/1.0" 200 2736 "-" "FAST -WebCrawler/2.2-pre27 (crawler@ fast.no; http://www .fast.no/faq/ faqfastweb
130.91.196.245.in-addr.arpa--[04/Feb/2001:00 :17:40 0500] "GET /facilities/count_use.html?resource =ABI/Inform %20 %20Ovid
&method= Ovid&url=http:// www.abi-ovid.library.upenn.edu/ovid
web/ovidweb.cgi? T=JS& PAGE =main&MODE=ovid& D=infoz HTTP/1.1"
200 2039 "http://www.library.upenn.edu/webbin5/resources/ databases.cgi?
business" "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)” 203.197.226.240 - search/faqfastwebcrawler.html)"
[04/Feb/2001:00:17:41 -0500] "GET /etext/sasia/aiis/architecture /khajuraho/010.html HTTP/1.0" 200 4427 "http://www.
library.upenn.edu/etext/ sasia/ aiis/architecture/ khajur aho/" "Mozilla/4.7 [en] (Win95; I)” 203.197.226. 240- -[04/Feb/200
1:00:17:44 -0500] "GET /images/banner. gifHTTP/1.0" 404 2814 "http://www.library. upenn. edu/etext/sasi a/aiis/architecture
/khajuraho/010.html" "Mozilla /4.7 [en] (Win95; I)"pub237.lib.upenn.edu - - [04/Feb/ 2001:00:17:48 -0500] "GET / HTTP/1.0"
200 8070 "-" "WebTrends Alert” dial-123-130.dial. indiana.edu - - [04/ Feb/2001 :00:18:02 -0500] "GET /special/ photos/
theater/504.html HTTP/1.0" 200 3247 "http://www.library.upenn. edu /special/photos/ theater /503.html" "Mozilla/4.7 C-CCK
MCD {C-UDP; EBM-APPLE} (Macintosh; I; PPC)” dialin1085. upenn.edu--[04/Feb/ 2001:00:18: 04 0500]"GET/facilities/count_use.html?resource=China%20Economic%20 Review& method= ejs& url=
http://www.sciencedirect.com/ science/journal/ 1043951XHT TP/1.0" 200 2027 "http:// www.library.upenn.edu/webbin 5/
resources/ejspubl ic5.cgi?homepage=http:// www. library.upenn.edu/lipp incott/&community= Business" "Mozilla/ 4.0
(compatible; MSIE 5.0; Windows 98; DigExt; SPIKE 5)” 203.197. 226.240 - - [04/Feb/2001:00:18:07 -0500] "GET
/etext/sasia/aiis/ architecture/khajuraho/ 010a.jpg HTTP/1.0" 200 89117 "http://www.library.upenn.edu/etext/sasia/ aiis/arch
itecture/khajuraho/010.html" "Mozilla/4.7 [en] (Win95; I)”
Through the Bytes Darkly
…and information culled from databases that generate our Web pages...
Æ |http://www.uqtr.uquebec.ca/AE/index.html|World||||History of Art|FT|No|07-16-1999 : 11:11|10-25-2000 : 11:30||
ABA Bank Compliance
|http://proquest.umi.com/pqdlink?Ver=1&Exp=07-012003&REQ=3&PUB=14954&Cert=0CEccdp7
aMS6kuCDmdhPNL%2bQ2tTOLTrDEHAz%2bYmHN172RUqZPCJ2SvAT
X%2bFGA7htIYkVlFVWSyawE0NvKlpBZ%2bO%2f%2bLEWBnchnwLT9%
2b%2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdDKBum2vykhvxsy
RQutjuMGKfxAKHOA4-|Penn|ABI/Inform|||Business,Finance|F-TPI| No|0313-2001: 00:01|03-14-2001 : 11:31|mw|
ABA Journal
|http://proquest.umi.com/pqdlink?Ver=1&Exp=07012003&REQ=3&PUB=27585&Cert=PfySiFXf1
0i6kuCDmdhPNL%2bQ2tTOLTrDEHAz%2bYmHN172RUqZPCJ2SvATX%
2bFGA7ht1pGvDP%2bFxrGwE0NvKlpBZ%2bO%2f%2bLEWBnchnwLT9%
2b%2fdGGHSlx0PO3dxUQd3g2S9QP2FghKaQ2ncl5EdDKBum2vykhvxsy
RQutjuAyIsegc4Y7Y-|Penn|ABI/Inform|||Finance|F-TPI|No|03-13-2001:
00:01||mw|
ABI/Inform
|http://www.umi.com/pqdauto|Penn||||Biomedic
al Research,Management,Business,Clinical
Medicine,Clinical Medicine,Nursing, Econo
mics, Health Care Policy & Management|
F-TSDb|No|07-16-1999 :11:11|02-09-2001
12:14||
Through the Bytes Darkly
…to extracting, parsing, storing, and mining for significant content.
hrn1117.resnet-student.upenn.edu
4-Feb-01
17:25:31 Music Index Online
|
|
|
dhcp004.lib.upenn.edu
4-Feb-01
17:26:54 LEXIS/NEXIS Academic Universe
|
|
|
node.uphs.upenn.edu
4-Feb-01
17:27:13 Internet Grateful Med
|
|
|
biomed07.med.upenn.edu
4-Feb-01
17:27:44 MEDLINE
|
|
|
janus.siast.sk.ca
4-Feb-01
17:30:18 Design Studies
|ScienceDirect |05,24,40,68
|F-TPI
lrsm228pc2.lrsm.upenn.edu 4-Feb-01
17:30:29 Physical Review B
|
|10,55
|F-TPI
sub200141.colstate.edu
4-Feb-01
17:32:15 Representations
|JSTOR
|05,14,25,29,31,36,38,64
|F-TPI
ref124.lib.upenn.edu
4-Feb-01
17:36:05 British Journal of Aesthetics
|Oxford
|54
client-151-200-195-5.silvmul.com
4-Feb-01
17:37:20 Dow Jones Interactive
|
|08,09,13,20,27,32,37,51,56
|F-TSDb
janet.sas.upenn.edu
17:39:28 MEDLINE
|
|
|Ovid
|08,44,09,13,20,27,32,3
|F-TSDb
17:41:48 MEDLINE
|
|
ip-20-12-201.philly-t.navipath.net
4-Feb-01
17:44:52 Dow Jones Interactive
|
|08,09,13,20,27,32,37,51,56
|F-TSDb
t13pc.seas.upenn.edu
4-Feb-01
17:48:10 MEDLINE
|
|
dialin1085.upenn.edu
4-Feb-01
18:04:05 China Economic Review
|ScienceDirect |09
|
dhcp066.lib.upenn.edu
4-Feb-01
18:07:12 Anthropological Literature
|
|
|
hil-235-113.resnet.upenn.edu4-Feb-01
18:09:08 Journal of Sociolinguistics
|ECO
|02,43,64
|F-TPI
dhcp082.lib.upenn.edu
4-Feb-01
20:11:30 Art Index
|
|
|
gvdf.med.upenn.edu
4-Feb-01
21:07:12 Biochemistry
|ACS
|10
|F-TPI
lsw-103-197.greeknet-student.upenn.edu
4-Feb-01
22:07:12 LEXIS/NEXIS Academic Universe
|
|
|
nic-21-95.resnet.upenn.edu 4-Feb-01
|
|
|
130.91.196.245
198.84.16.26
4-Feb-01
4-Feb-01 17:40:43 ABI/Inform
4-Feb-01
23:07:12 ERIC
|F-TPI
|
|
|
Through the Bytes Darkly
Use of Licensed Resources
What Databases Do Our Clients Use at What Cost?
15 Most Frequently Used Index/Abstract/Full-text Databases in FY 2001
Database
MEDLINE
Log-ins
Pct Total
Cost Per Login
205,150
22.9%
$
0.10
LEXIS/NEXIS
63,817
7.1%
$
0.42
Academic Index
52,407
5.9%
$
0.58
Dow Jones
39,828
4.5%
$
0.68
ISI Citation Indexes
39,753
4.4%
$
2.75
ABI/Inform
36,190
4.0%
$
1.09
PsycINFO
Investext
27,636
17,695
3.1%
2.0%
$
$
0.89
0.68
Business & Industry
16,797
1.9%
$
0.55
CINAHL/Nursing
16,232
1.8%
$
0.36
PubMed
15,610
1.7%
$
-
MLA International
13,359
1.5%
$
0.41
Multex
12,196
1.4%
$
0.10
ERIC
10,852
1.2%
$
0.54
EconLit
8,940
1.0%
$
0.80
Hoovers Online
8,905
1.0%
$
0.22
Inter Bibliog Soc Science
8,152
0.9%
$
0.38
Sociological Abstracts
7,703
0.9%
$
1.58
S&P Industry Surveys
7,346
0.8%
$
0.63
D&B Million $ Database
6,376
0.7%
$
1.74
894,416
100.0%
All others
Through the Bytes Darkly
Use of Licensed Resources
What Are the High Use E-Journals, Data for FY2001
Title
Log-ins
Pct Total
Log-ins
Log-ins
On Campus Off Campus
Science
4,232
1.5%
3,114
1,057
Nature
4,081
1.4%
2,880
1,173
Journal of Biological Chemistry
2,408
0.8%
1,883
519
Journal of the American Chemical Society
2,405
0.8%
2,153
247
New England Journal of Medicine
1,994
0.7%
1,359
620
Angewandte Chemie (international edition)
1,836
0.6%
1,665
167
Journal of Organic Chemistry
1,660
0.6%
1,504
150
Proceedings of the National Academy of Sciences
1,608
0.6%
1,246
360
Tetrahedron Letters
1,361
0.5%
1,218
143
Organic Letters
1,308
0.5%
1,208
99
Proceedings of the National Academy of Sciences, U.S.
1,285
0.5%
1,017
266
Journal of Molecular Biology
1,060
0.4%
850
210
JAMA: The Journal of the American Medical Association
1,023
0.4%
650
352
Journal of Chemical Physics
992
0.3%
819
172
Journal of Finance
887
0.3%
423
378
Lancet
867
0.3%
637
227
American Journal of Sociology
860
0.3%
384
373
Medicine
849
0.3%
580
263
Applied Physics Letters
834
0.3%
751
83
Physical Review B
826
0.3%
727
98
Through the Bytes Darkly
Use of Licensed Resources
How Much Bang Do We Get on the Dollar For E-Journals?
E-Journal Subscription Costs Per Log-In, FY2002 (July-April)
Publisher
Log-ins
Pct of Total
Cost Per Login
ScienceDirect
139,727
27.1%
$0.63
ECO
70,730
13.7%
$0.09
JSTOR
48,668
9.4%
$0.35
Wiley
38,255
7.4%
$0.09
ACS
31,865
6.2%
$0.12
Ideal
30,568
5.9%
$5.51
Blackwell/Munksgaard
28,940
5.6%
$0.27
Journals@Ovid
26,982
5.2%
n/a
Oxford
14,819
2.9%
$0.20
SpringerLINK
13,507
2.6%
n/a
ABI/Inform
12,785
2.5%
$3.08
Project Muse
11,438
2.2%
$1.22
AIP
7,873
1.5%
$5.01
Cambridge
7,835
1.5%
n/a
Annual Reviews
7,215
1.4%
$0.08
IEEE
7,132
1.4%
$6.73
RSC
5,661
1.1%
n/a
Others†
11,451
2.2%
515,451
100%
Total
† 11 publishers
Through the Bytes Darkly
Use of Licensed Resources
How Does Use Scatter Across Databases
Use Measured in Log-ins for FY 2001
100%
90%
80%
PCT of Use
70%
60%
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
PCT of Titles
60%
70%
80%
90%
100%
Through the Bytes Darkly
Database Use by Penn’s
Schools & Centers
Use of Licensed Resources
School
How Does Database Use Distribute By Communities?
Per Capita Use of Databases by Penn’s Schools and Centers, FY 2001
Medicine
23.2%
Arts & Sci
20.4%
In-Library
12.8%
Wharton
12.3%
55
Admin
4.8%
50
Enginrng
4.4%
Nursing
3.5%
Veterinary
2.5%
40
Education
1.4%
35
Social Wrk
1.8%
Commnctn
1.3%
Fine Arts
1.0%
25
Dental
0.9%
20
Law
0.5%
Dorms
9.3%
45
Log-ins Per Capita
Pct of Log-ins
30
15
10
5
0
School and Center Domains
†Does not include resources licensed by the Law Library for Law school affiliates
Through the Bytes Darkly
Use of Licensed Resources
Database & E-Journal Log-ins by Subject (based on log samples from FY2001)
Subject focus
Network Domain
Human.
Life
Science
Social
Science
Business
Physical
Science
Total
Administration
Wharton
Annenberg
Medical
Dental
Veterinary
Dialin
Education
Fine Arts
Law
Library
Nursing
Student Residences
Arts and Sciences
Engineering 0
Social Work
Unresolved
21.1%
02.9%
15.2%
02.3%
01.8%
01.7%
08.5%
24.6%
29.0%
13.0%
21.3%
15.9%
18.9%
08.2%
1.5%
20.6%
18.9%
36.5%
74.3%
32.1%
86.0%
87.7%
96.0%
63.2%
13.1%
18.5%
26.6%
54.8%
73.1%
57.0%
26.3%
29.5%
29.1%
44.7%
13.9%
03.2%
42.3%
01.9%
08.9%
00.6%
09.9%
61.5%
45.7%
20.9%
09.1%
07.8%
12.6%
5.7%
2.3%
41.6%
17.8%
07.0%
19.2%
08.9%
01.0%
00.2%
00.4%
15.4%
00.8%
5.6%
37.0%
08.5%
03.2%
09.0%
09.9%
01.2%
06.1%
10.0%
21.6%
00.5%
01.5%
08.8%
01.4%
01.3%
02.9%
00.0%
01.2%
02.4%
06.3%
00.0%
02.5%
49.9%
65.6%
02.7%
08.6%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
100.0%
Total
14.7%
50.7%
11.9%
8.6%
14.1%
100.0%
Through the Bytes Darkly
Use of Licensed Resources
Where Do Our Clients Access Information?
Database Log-ins by Domain, FY2001
Campus Residences
10%
Off-Campus
15%
In-Library
25%
On-Campus Depts
50%
Through the Bytes Darkly
Use of Licensed Resources
Where Do Communities of Clients Work?
Database Log-ins from Off Campus as a Percent of Total Log-ins, FY2001
100.0%
90.0%
Pct. of Log-ins
80.0%
70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%
0.0%
School or Center
On Campus
Off-Campus
Through the Bytes Darkly
Use of Licensed Resources
When Are They Working?
Database Use by Time of Day, FY2001
25000
In-Library
Student Houses
Schools
Campus Modem Pool
15000
10000
5000
11-12 PM
10-11 PM
9-10 PM
8-9 PM
7-8 PM
6-7 PM
5-6 PM
4-5 PM
3-4 PM
2-3 PM
1-2 PM
12-1 PM
11-12 AM
10-11 AM
9-10 AM
8-9 AM
7-8 AM
6-7 AM
5-6 AM
4-5 AM
3-4 AM
2-3 AM
1-2 AM
0
12-1AM
Attempted Logons
20000
Through the Bytes Darkly
Use of Licensed Resources
How Does Audience Composition Change Through the Day?
Database Use by hour, FY2001
60.0%
Network Domains:
In-Library
Student Houses
Campus Modem Pool
All Other (non-Penn domains)
The points on this line total 100% of logons for the hour 5-6 AM . In this
example, "All Other Domains" accounted for 46.6% of attempted log-ons
during the given hour.
50.0%
40.0%
30.0%
20.0%
11-12 PM
10-11 PM
9-10 PM
8-9 PM
7-8 PM
6-7 PM
5-6 PM
4-5 PM
3-4 PM
2-3 PM
1-2 PM
11-12 AM
10-11 AM
9-10 AM
8-9 AM
7-8 AM
6-7 AM
5-6 AM
4-5 AM
3-4 AM
2-3 AM
1-2 AM
0.0%
12-1 PM
10.0%
12-1AM
Percent of Total Logons for Each Hour
Schools
Through the Bytes Darkly
The Data Farm Experiment: Tools That Serve Information Access Can
Also Serve Measurement
Through the Bytes Darkly
Schematic of the Data Farm
As of May 2002
Through the Bytes Darkly
Scripts Server
logs
Oracle
Staff Client
Server array
Data Farm Processes
DLXS
Voyager
Through the Bytes Darkly
Perils of the MIS Prototype: Lessons Learned
Normalize the Data
Regularize the Migration of Logs from Production Machines
Manage the Storage
Maintain the Scripts
Standardize Processes: program modules, plug-in scripts
Optimize Usability
Through the Bytes Darkly
Why Are the Data Important?
“If you don’t know where you’re going, you’ll probably end up somewhere else” - Casey Stengel
To Demonstrate Accountability:
Is the library spending the Schools’ money effectively?
(Pressures of Penn’s responsibility center budget environment)
To Understand and Describe the Transfer of Technology:
Is the academic information universe a digital universe (as some at
Penn believe)?
Is the digital universe more cost efficient than the paper one (as some
at Penn believe)?
To Guide the Improvement of Existing and the Development of
New Services
To Ensure the Successful Fulfillment of Our Mission
Through the Bytes Darkly
Through the Bytes Darkly,
Management Information and the Digital Library
Joe Zucca
University of Pennsylvania Library
zucca@pobox.upenn.edu
Through the Bytes Darkly
Return-Path: <olson@pobox.upenn.edu>
Subject: Again, testing general databases
To: sblack@asc.upenn.edu
Date: Wed, 10 Apr 2002 16:54:11 -0400 (EDT)
From: olson@pobox.upenn.edu
Dear Sharon -Just a second quick note begging you, please, keep trying to look
at those three databases!
Data farm usage logs indicate that one-quarter of all database
logins from Annenberg IP addresses in 2001 were pointing to
Academic Index (followed by Lexis-Nexis and PsycInfo, both with
about 10-percent of all Annenberg database logins).
Also, 15-percent of all Academic Index school-based logins last
year came from Annenberg IP addresses, more than from all schools
except Arts and Sciences (at 30-percent).
Considering how much Annenberg people use the general database -and you must know best how they can raise Holy Ned over the least
change, I hope that you can find the time to check out the three
candidate databases. I'm happy to come over and walk you through
the log-in.
Through the Bytes Darkly
Journal of the American Chemical Society
Journal of Organic Chemistry
Tetrahedron Letters
160
140
R2 = 0.2211
120
Log-ins
100
80
60
40
20
0
0
50
100
150
Reshelves
200
250
300
Through the Bytes Darkly
Basic Search (262952)
Guided Keyword Search (111710)
Parameter
Search Count
Parameter
Frequency
Title
69498
Title
41389
Author
51080
Keyword General
37256
Journal Title
49437
Author
21725
Subject
44592
ISSN [xxxx-xxxx]
17378
Keyword
40727
Subject
11580
Call Number
4905
ISBN
2422
Prolific Author
776
Series Title
389
Relevance Ranked
743
Publisher Name
334
Publication Date
197
Publication Place
113
Contents Note
96
Conference
77
Corporate Author
24
Music opus/Thematic index no.
21
Publisher no.(music,sound,video)
20
Through the Bytes Darkly
vetlib1.vet
vetlib1.vet
vetlib1.vet
vetlib1.vet
vet119.lib
vet119.lib
vet119.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet118.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
vet115.lib
11:20:46
CAB Abstracts
17:14:22
ISI Citation Indexes
17:17:32
MEDLINE
9:19:50
CAB Abstracts
18:08:49
OVID Multi-Database Searching
20:01:37
OVID Multi-Database Searching
17:58:17
OVID Multi-Database Searching
12:50:03
jake
14:13:43
BIOSIS Preview s
14:38:48
Bryologist
14:41:30
Invitro Animal Cellular & Developmental Biology
14:43:24
Condor
14:52:22
Evolution
14:53:29
Journal of Eukaryotic Microbiology
14:54:27
Journal of Mammalogy
11:51:39
OVID Multi-Database Searching
14:57:26
Journal of Medical Entomology
16:28:24
WorldCat
16:32:44
RLIN
19:02:16
OVID Multi-Database Searching
17:08:02
Journal of Biological Chemistry
14:44:11
CAB Abstracts
14:21:51
Journals@OVID Full Text
14:04:40
MEDLINE
15:07:49
CAB Abstracts
15:08:33
MEDLINE
18:46:13
MEDLINE
11:06:02
Journal of Veterinary Medical Education
11:06:23
Veterinary Research
11:08:03
Vet On-line: the International Journal of Veterinary Medicine
11:09:54
MEDLINE
15:06:20
CAB Abstracts
10:52:24
CAB Abstracts
14:57:19
CAB Abstracts
17:26:32
Journal of Biological Chemistry
15:50:19
CAB Abstracts
13:36:47
Animal Genetics
Through the Bytes Darkly
ScienceDirect Articles Viewed, FY 2001
100%
90%
80%
PCT of Titles
70%
60%
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
60%
PCT of Science Direct Titles
70%
80%
90%
100%
Through the Bytes Darkly
Academic Press (Ideal) Articles Viewed, FY 2001
NERL Pct used
100%
Penn Use
90%
80%
PCT of Use (full text views)
70%
60%
50%
40%
30%
20%
10%
0%
0%
10%
20%
30%
40%
50%
PCT of Ideal Titles
60%
70%
80%
90%
100%
Download