Music to our eyes: Music Content in Google Books, Google Scholar

advertisement
Music to our Eyes: Music Content in Google Books, Google Scholar,
and the Open Content Alliance
Kirstin Dougan, Music and Performing Arts Librarian, Music and Performing Arts Library,
University of Illinois at Urbana-Champaign
This paper is an expansion of a talk given at the Music Library Association Annual Meeting in
Chicago, IL February 2009.
Abstract
The Internet has changed how scholars perform their research. With more and more
content available online that is fully searchable (often even for free), it’s not surprising that
students and others may turn there first before coming to the library. This study aims to identify
the utility of three of the largest free access points to print materials online—Google Book
Search, the Internet Archive, and Google Scholar, for use in conducting music research. This
study discovers that there are print music materials represented in all three of these tools and that
they should not be eschewed by music librarians and others researching music topics.
Introduction
Google Book Search, the Internet Archive, and Google Scholar represent a large part of
the rapidly expanding digital universe, and are often where students turn first when looking for
information sources for research projects in every discipline, including music.1 The convenience
of 24x7 access from anywhere via an internet connection often trumps our carefully selected but
sometimes hard to use subscription tools in students’ minds. While there are certainly several
high-quality digital projects hosted by libraries and others that include music content,2 these may
not be the first place students look for resources; they instead often turn to Google because it’s
what they know and are comfortable with. Rather than chastising students for not coming to the
library, we should instead try to find value in the tools they already use and discover how their
Page 1 of 27
September 10, 2009
use might be successfully integrated with our tools. Karen Schneider observed in 2006 that “The
User is Not Broken,”3 and therefore it follows that librarians should be familiar with the tools our
users are using beyond the library’s resources.
This paper will examine two of the largest mass digitization projects in recent years,
Google Book Search and the Open Content Alliance’s contributions to the Internet Archive; as
well as Google’s attempt to identify scholarly communications: Google Scholar. The goal is to
present a survey of the scope and extent of music-related materials located in these three tools.
The focus is on printed music (scores), music monographs, and music periodicals, but will ignore
the recordings in the Internet Archive. Anecdotal evidence suggests that there is not much, or
perhaps not much of quality, in these three tools for music librarians and others looking for
music and music research materials to consider. However, this is not the case. While it’s not
possible to provide a complete list of all of the music content in these tools, it is possible to
outline the major items of significance found in them. Although each of these tools has their
drawbacks, and for the most part are a far cry from the controlled environments in which
librarians are most comfortable (e.g., library catalogs and subscription journal databases), they
do offer content and features that we should not ignore, because our users certainly aren’t.
What are they?
Google Book Search and the Open Content Alliance’s contributions to the Internet
Archive are both access digitization projects, meaning their goal is to digitize large quantities of
materials in a cost-effective manner to allow increased discovery of and greater access to these
materials than is possible for their print versions alone.4 Both provide scanned images of books
and other content that have been made searchable through optical character recognition (OCR).
Typical OCR programs do not know how to encode musical notes, which is likely one reason
Page 2 of 27
September 10, 2009
why more musical scores have not been included in either of these projects. The scores that are
included are therefore only searchable by title, composer, publisher, and perhaps instrument and
movement names, if this information is included in the metadata. Whereas books and other print
materials can be searched by their full text, scores cannot. An additional problem presented by
scanning scores is their size. While most monographs are of a somewhat consistent size, musical
scores can be very small or very, very large, with no standard size between publishers (although
individual publishers may have their own standard size). Music scores can also have unusual
binding configurations not common in books that could present a challenge for the scanning
operators. Additionally, scores may also consist of a score showing all of the instrument/voice
parts and individual parts for each performer. While this could potentially present a problem
when digitizing them, Google and OCA’s scanning setups should be flexible enough within costeffective boundaries to accommodate scores with parts.
Google Book Search
Google Book Search (GBS), launched in beta as Google Print in 2003, renamed in 2004,
and graduated from beta in mid-2009, states that its aim is “to help you discover books and learn
where to buy or borrow them, not read them online from start to finish.”5 This is worth
remembering when evaluating how much of its content is available in full-text and how much is
only indicated as a citation or limited preview. From its earliest days, GBS has included
information and (often limited) content for many items that are supplied directly by their
publishers, and currently has information from 25,000 publishers in over 100 countries and 100
languages.6 GBS famously partnered with libraries, which then became the primary full-text
contributors of content, and includes American university libraries, and the NYPL, in addition to
notable European national and university libraries such as Oxford. The full-text content in large
Page 3 of 27
September 10, 2009
part is out of copyright,7 while entries for in-copyright materials offer only a limited preview or
just a citation.
For many titles GBS offers the table of contents, information about other editions of the
same title, information about related books, references to the book from other books, journals, or
websites, and links to reviews. Also very helpful is the “Find in a Library” feature, which is
powered by Open WorldCat.
Open Content Alliance/Internet Archive
The Open Content Alliance (OCA) was formed in 2005 specifically to counter GBS.8 It
was intended to be a model of transparency as opposed to the perceived secrecy of the Google
business model, although there is disagreement about whether a higher level of transparency has
indeed been reached.9 It is a consortium of academic and public libraries and organizations
contributing to a permanent, publicly-accessible archive of digitized texts. They include the
Boston Library Consortium, the California Digital Library, CARLI, Lyrasis, TRLN, and many
individual libraries, including the Library of Congress and the NYPL. Also included are GBS
project libraries including the University of Illinois, Harvard, and Oxford.10 The contributed
content from these libraries forms the Americana Collection in OCA. Microsoft funded much of
the initial OCA scanning, and was primarily interested in Americana, therefore “only those very
few libraries who pay for their own scanning with OCA got to choose what was actually
scanned. Microsoft didn’t specify individual titles for the libraries they funded, but they certainly
specified subject coverage.”11 Other collections in the text archive include over 300,000 items
from Canadian Libraries, European Libraries, and the Universal Library Project, (aka the Million
Books Project), Project Gutenberg, and others.
Page 4 of 27
September 10, 2009
The materials contributed by OCA members are delivered via the Open Access Text
Archive section of the Internet Archive (which is why the names OCA and Internet Archive are
sometimes used interchangeably). Like GBS, the content is in large part out of copyright, but
there are in-copyright materials, for which OCA has obtained permissions from publishers for
beforehand.12 However, it has been observed that OCA has mis-stated the copyright status for
some items (claiming some things are under copyright that are not and vice versa).13
Google is the larger of the two projects, with over seven million items,14 compared to the
Open Content Alliance’s Americana Collection’s total of just less than one million items, but of
course both of these numbers are rapidly increasing.
Google Scholar
Released in beta in 2004 (and unlike GBS still in beta), Google Scholar (GS) is a bit
different from GBS and OCA, in that it focuses solely on what it deems “scholarly” literature,
including journal content in addition to book and dissertation citations. GS searches the same
content as Google.com, but limits results to what it deems scholarly “peer-reviewed papers,
theses, books, abstracts and articles, from academic publishers, professional societies, preprint
repositories, universities and other scholarly organizations. Google Scholar helps you identify
the most relevant research across the world of scholarly research.”15 OCLC loaded citations for
most of WorldCat into Google and Google Scholar, which vastly increases their scholarly
content.16 GS claims to help users “identify the most relevant research across the world of
scholarly research…weighing the full text of each article, the author, the publication in which the
article appears, and how often the piece has been cited in other scholarly literature.”17 This
methodology somewhat disadvantages the most recently-published research because its newness
dictates that it won’t have many, if any, citations yet.
Page 5 of 27
September 10, 2009
Searches can be limited by date, author, or publication, among other things. Results are
sometimes citation only, sometimes a link to a content service provider, and sometimes a link
directly to a PDF. The service provider links lead to various sources, not necessarily ones your
institution subscribes to, so researchers may need to do another search in a library-owned
resource to find full-text availability. Very useful features of GS include an item’s “Cited by”
counts and links, “related articles” links, and “Discover” SFX linking (if available at your
institution).18 This citation linking feature is especially useful for libraries that don’t have access
to citation indexes such as Web of Science.
One of the biggest limitations of GS is that it doesn’t provide a list of journals it indexes.
For any subscription journal tool, we can browse a list of indexed journals by subject to help
determine if the tool is worth using for the question at hand. Another issue is that of multiple
versions, which is problematic in both GS and GBS, but for different reasons. GBS does not
collocate various versions of an item scanned from different sources, and each version may be of
vastly different quality, so it is up to the user to determine which version of an item is acceptable
for their use. In the case of GS, there may be a preprint and a published version of the same
article; but again, like items are not collocated in the results, and so the user may find one but not
know the other exists. This is perhaps less of a problem for music/humanities than the sciences,
but is still worth knowing.19
Literature Review
Although Google Book Search has gotten plenty of ink in the popular and trade media,20
especially concerning its legality, there are fewer articles concerning Open Content Alliance, and
altogether fewer scholarly articles concerning evaluations of either GBS’s or OCA’s content.
Kalev Leetaru and Robert Lackie’s 2008 articles provide a good history of Google Books and
Page 6 of 27
September 10, 2009
OCA and comparisons of their functionality.21 Paul Duguid’s 2007 article attempts to test the
levels of quality assurance present in Google Books and finds it somewhat lacking.22 Many
articles, including as Duguid’s, observe that GBS has some issues with scanning quality. They
note that some images are blurred or partially obscured, and that there are examples of the
metadata not matching the image files (what’s scanned is not what it says it is). Jonathan
Bengston provides one of the only articles solely about OCA and its history. 23
GBS was favorably reviewed for the Music Library Association’s periodical Notes in
September of 2008.24 Joseph Grobelny outlined the various strengths and weaknesses of the tool
in terms of searching and content related to music. He notes that there are several “historical
American music resources, like Louis Elson’s The History of American Music (1915) or O. G.
Sonneck’s Bibliography of Early Secular American Music (1905) found in GBS. There is also
full-text access to Grove’s Dictionary of Music and Musicians (1907 and 1920) and Baker’s
Biographical Dictionary of Musicians (1905 and 1919), along with numerous other music
reference sources.”25 He found that based on searches by LC subject headings, the content is
stronger for things found in ML and MT ranges, which makes sense, since those are used for
books, while scores (LC class M) receive less coverage. He also determined that the broad
categories of music history, musical analysis, music theory, jazz, and rock/popular music and
music instruction are well represented, and that there is a weakness in world music.
Studies of Google Scholar vary in their estimations of its quality, content, currency, and
search precision. This is in large part due to the methodologies and criteria employed, but also
because of the databases used as targets for comparison. Researchers comparing GS to a general
abstracting and indexing database, or against subject-specific databases from several disciplines,
Page 7 of 27
September 10, 2009
have found different results than those comparing GS to multiple databases in the same
discipline.
When it was first launched, the content in GS seemed to focus primarily on the sciences
(for example, subject-specific evaluations of GS have been conducted by John Meier and
Thomas Conkling (Engineering) and Michael Levine-Clark and Joseph Kraus (Chemistry))26, but
now there is a surprising amount of content in other disciplines, including music. According to
the 2006 article “The Depth and Breadth of Google Scholar: An Empirical Study,” of which one
of the authors is a music librarian, Google Scholar contains 6% of the citations in the
International Index to Music Periodicals, 30% of JSTOR (which has over 40 music titles), and
94% of the Cambridge Journals Online (which has 14 music and drama titles).27 Based on
percentages of journal titles covered, they draw the conclusion that GS is best for research in the
sciences and not as good for social sciences or humanities topics. Philipp Mayr and AnneKathrin Walter found in their comparison of Google Scholar against the Web of Science
(including the citation indices for Arts and Humanities, Social Sciences, and Science) and Open
Access journals from the Directory of Open Access Journals, that 80.5% of the journals from the
Arts and Humanities Citation Index are included in GS (41.8% via links, 50.73% as citations
only, and 7.49% as full-text), and that for titles from the Directory of Open Access Journals
(which has 30 music titles), 22.11% are included in full text, 48.2% are available via links, and
29.61% area available as citations only.28
Susan Gardner and Susanna Eng, in their comparison of GS against some of the standard
social sciences databases, determined that GS had more content, a greater variety of document
types, and that results had greater relevancy rankings. But, they found that GS was not as current
in its indexing as the other tools.29 In his study comparing GS’s recall and precision against other
Page 8 of 27
September 10, 2009
bibliographic databases (mostly general, social science, and science) William Walters found that
“the idiosyncrasies of Google Scholar’s search mechanism--the absence of controlled subject
terms for example--do not compromise its ability to retrieve relevant results in response to
simple keywords searches. In fact, the GS mechanism performs better than most.”30 He
concludes that “These findings suggest that a searcher who is unwilling to search multiple
databases or to adopt a sophisticated search strategy is likely to achieve better than average recall
and precision by using Google Scholar.”31 However, he does provide the caveat that results may
vary if the study were performed with different databases in a different field. In fact, in her
article focusing on core literature coverage of ecology in GS, Marilyn Christianson found that
GS only indexed 57-77 percent of articles from sample list of articles from core ecology journals,
or, if held to a higher level of citation standards, only about fifty percent of the articles were
indexed.32
One of the harshest evaluators of GS, Peter Jascó reports in “Google Scholar Revisited”
(a 2008 reprise of the studies he did in 2005 soon after it came on the scene), that there are
various searching abnormalities that make it difficult to get accurate results.33 He cites
specifically problems with “innumeracy” and “illiteracy.” The first term refers to inconsistent
results in GS when trying to limit searches (e.g., using the Boolean operator OR should increase
the number of results returned, instead they decreased). In addition, even when large result sets
are returned, only the first 1000 results will be listed.34 “Illiteracy” refers to GS’ “deficiencies
distinguishing author names from other parts of the text using its parsing algorithm” (e.g., author
names such as M Data, R Findings, and N. Vietnam).35 While Jascó agrees that GS’s sheer size
and the fact that it includes many document types not normally covered in A&I databases are
good things, he balances that with the observations that it’s impossible to determine GS’s true
Page 9 of 27
September 10, 2009
size and that it does not cover as many of the open access materials (such as PubMed and
Nature) as it could.
Discovering what they hold for music scholars: methodology
The goal of this study of Google Book Search (GBS) and Open Content Alliance (OCA)
focused on printed music (scores) content, with locating music-related book and journal content
a secondary concern. Music scores are unique to the discipline, while every discipline has
monographs and journals. The search options available in these tools and metadata provided
challenges to this task, as they are not functionally equivalent to library catalogs. In addition,
there are not options for sorting and manipulating the search results in GBS and only a maximum
of 1000 items can be seen at a time. OCA results can be sorted by Average rating, Download
count, Date, or Date added, but not by the traditional methods of author, title, or publisher.
Results can also be grouped by Relevance, Mediatype (as in file type, not by format of the
original item), or Collection (American Libraries, Canadian Libraries, Universal Library, Project
Gutenberg, etc.). But perhaps the largest problem is the one of format—there is no way to limit a
search in GBS or OCA to printed music. An inherent difficulty in searching for printed music (or
recordings) in any system, whether it is a library online catalog or a digital library, is the need to
distinguish whether you want materials written by a particular composer or written about him,
and whether you want the particular piece of music itself, or writings about it. In library catalogs
this can be accomplished through a combination of limiting by format, and being cognizant of
which terms we put in the author, subject, and title search boxes. These exact options, however,
are not available in GBS or OCA.
Page 10 of 27
September 10, 2009
Google Book Search
In order to identify music scores in GBS, the concentration was on things that were at
least available in limited or full view only, not just citation only. In the first phase of searching
the advanced search “Return books on subject” option was used to search GBS for the following
LC subject headings: Hymns, Piano music, Sonatas piano, Violin music, Sonatas violin, and
Orchestral music. These searches revealed that there are machine-assigned keywords for music,
piano music, Sonatas (piano), songs with piano, orchestral music, etc. which helped narrow
further searches. Unfortunately, GBS’s subject search does not differentiate between the subject
term “music” meaning “book about music” or “printed music.” As Grobelny stated in his review
of GBS, “…their application of subjects to books is uneven. Many Library of Congress subject
headings (LCSH) get results with the subject search […] Many items […] have rudimentary
subjects, such as orchestral scores that have only one subject: “Music.” He also notes that due to
lack of strict authority control, older forms of LC subject headings are present if they were
present in the contributing library’s metadata. While some subject headings are better than none,
this might pose a complication for some searchers.
The second and third phases emphasized finding book and journal content but used
different search approaches. In the second phase searches for known items were performed,
particularly reference books and historic journals, with focus on those available in full view.
Searches were also conducted on publisher names that occurred more frequently than others in
the initial search results for music scores.
In the third phase, broad keyword searches were performed for some basic concepts,
much in the manner an undergraduate might use when beginning a research project. The topical
search terms “music business,” “music education,” “music and copyright,” “music theory,” “rap
Page 11 of 27
September 10, 2009
music,” and “Sarum gradual” (an important 13th century text) were used, and searches were not
limited to full-view only. Any number of searches could be conducted for various terms relating
to music. These were chosen because they represent a wide spectrum of basic searches patrons
might make. Of course many real-life searches may include more specific terms and concepts.
OCA
In evaluating OCA the focus was on the American Libraries sub-collection because it
was likely to be closest in scholarly content to Google Books, and also made the searches more
manageable. While OCA does have an advanced search, it is a bit harder to use because there are
“Custom fields” which have been used by contributors in different ways and “Media types”
which are also not consistent. The Media types “audio” and “music,” “Sound,” “sound,” and “au
dio [sic]” seem to apply to recorded sound, while “Text,” “Texts,” “text,” and “texts” can be
used for limiting search results to text, with different results. I searched for media type= Texts
and subject = music and also ran the same subject, known-item and keyword searches as in GBS.
These media limits were not an option in GBS.
Google Scholar
In searching for music-related journals and materials in Google Scholar (GS), the goal
was not like those studies trying to determine a precise number of titles included (an effort that
would be rendered moot if GS would make public its title lists and sources), but rather to
determine what is included in a broader sense that is relevant for music scholars. Anecdotal
evidence and studies such as Neuhaus et al (2006) and Mayr and Walter (2008) suggested that
there were not likely to be large numbers of music titles. Therefore, the effort was not made to
compare GS to the primary music journal indexing tools and instead compared it to the more
Page 12 of 27
September 10, 2009
general journal database Academic Search Premier from Ebsco. Given GS’s scope, music scores
were not likely to be found, and therefore those search phases from GBS and OCA were not
replicated.
Unlike GBS or OCA, GS allows searches to be limited to broad subject areas. In the
normal reference desk setting searches might be limited to “social sciences/arts/humanities,” to
make the results more precise and manageable. There is, however, content relevant to music
scholars in most if not all of the categories, especially those researchers with an interdisciplinary
focus. For the purposes of this study searches were not limited by subject area because there is
not a similar feature in Academic Search Premier.
The search for music materials was conducted in two phases. First, because several
studies claimed that GS has good coverage of open access journals, searches were performed for
publication names of the music titles on the Directory of Open Access Journals (DOAJ) list.36 In
a second phase, topical searches for “music business,” “music and copyright,” “music theory,”
“rap music,” “music education,” and “Sarum gradual” were conducted. And finally, these same
keyword searches were conducted in Academic Search Premier to compare the results against
GS’s.
Findings
Google Book Search
The subject search in GBS (see Table 1 for results), while offering some control, still
leaves a lot to be desired in accuracy because some of the subject keywords in the item records
come from library metadata and some are machine assigned during the OCR process. There are
more than two violin sonatas, but they don’t necessarily have subject terms that indicate as such.
Page 13 of 27
September 10, 2009
Additionally, as mentioned earlier, there is no format limit capability in searching and a subject
of “violin music” could mean violin music scores or books about violin music, so this is not a
perfectly accurate representation of score holdings in GBS.
[insert table 1]
In comparison, a simple keyword search for “piano music” in full view returned over 2,800
items. These include scores, books, song books like “Songs of Columbia”; and vocal scores of
operas and operettas. A few specific examples of scores include Mendelssohn’s Song without
Words (Schirmer), the Bach/Czerny Well-tempered clavichord (Schirmer), and MacDowell’s
Sonata Tragica (Schirmer). A large percentage of these scores were contributed by Harvard.
Hymnals are another form of printed music well-represented in GBS—presumably because they
are usually “book” sized, have text as well as music, and don’t pose any special scanning
problems.
Not surprisingly, one music publisher well-represented in GBS is Dover, which produces
well-known reprints of other publishers’ public domain works. When a search for subject= music
and publisher=Dover was conducted 3,373 items were returned. However, only 151 results were
viewable. As mentioned earlier, Jascó noted that a limit of 1,000 search results were viewable,
even on searches that returned more than 1,000 items. Nevertheless, entire search result sets were
consistently not viewable, even in sets numbering fewer than 1,000 items. Perhaps this is related
to how the search algorithms function—it may be counting occurrences of search terms, not
items, so an item that contains the search term three times counts as three results. In trying to
create a smaller result set to test, search for all Dover scores composed by Mozart was
performed. GB claimed to have 750 scores but again, only 318 items could be seen. All of the
Page 14 of 27
September 10, 2009
Dover scores are categorized as “limited preview” but all of the music content seems to be
present. What’s missing appears to be introductory matter still under copyright to Dover.
Other publisher names appeared frequently in initial searches, so some searches were
performed on the publisher field without limiting to “full view.” There are over 12,000 Schirmer
items, including books, scores, and some periodicals. A search for A-R Editions netted over
2,000 scores in “limited preview only” from their Recent Researches series37 and other score
publications. Other music publishers represented include Fischer, Ayer, Oliver Ditson, Presser,
Novello, and Universal Editions among others.
It is possible to limit GBS searches to “books” or “magazines,” but unfortunately,
“magazines” refers to the popular magazine content they added in late 2008, not scholarly
journals.38 This is relevant to music scholars because among the new popular titles added is
Billboard from 1942 forward in full view, but is not a complete run. It’s unclear why there are
gaps in the run, and this would likely be very confusing to users expecting to find issues from the
missing years.
Music periodicals are not widely represented in GBS in full text, but there are a few
exceptions, including these historically significant titles

(Leipziger) Allgemeine musikalishe Zeitung 1822-1882 complete, and some earlier issues
between 1798 and 1816

Rivista Musicale Italiana 1894-1908

Several issues of Die Musik from 1901-1908

A dozen issues of the Musical Times from the 1880s through the early 1900s

Two dozen issues of The Musical World ranging from the late 1840s to the 1880s
Page 15 of 27
September 10, 2009
Through known-title searching and from observing search results during other searches,
music monographs, reference works, opera libretti, and ballads all appear to be well-represented
in full-text. A few specific examples include the Biographie universelle by Fétis,39 Eitner’s
Bibliographie der Musik-Sammelwerke,40 Stanbrook Abbey’s 1897 Gregorian Music: An Outline
of Musical Palæography, Music of the Japanese from the 1891 Transcripts of the Asiatic Society
of Japan, 100 of the roughly 300 pages of Barry Brook’s Thematic Catalogues in Music, and
Grove’s “Dictionary of Music and Musicians.” A title search for this work returned 325 fullview items, but the results screens only listed 12 items. These volumes come from the first and
second editions of Grove, but neither is represented in full.41
Searches were also conducted on “all books” for a few subject concepts using “exact phrase”
keyword searching except where noted. See Table 2 for results.
[insert table 2]
As noted, the search for “music business” returned over 3,000 items, the first three screens of
results included primarily “limited preview only” items, many of which were less than ten years
old. Narrowing the search by clicking on Google’s suggested heading “music / business aspects”
returned just over 1,800 items. In contrast, searching WorldCat for “music business” and limiting
to books returned only 634 items. A search in GBS for “music education” returned over 3,600
items. The same search in WorldCat nets 9,600 books. The phrase “rap music” in GBS nets over
1,800 items and suggests the following ways to narrow the search: Refine results for rap music:
Music / Genres & Styles / Rap & Hip Hop; Music / Ethnic; Biography & Autobiography /
Composers & Musicians; Social Science / Popular Culture. While this study does not attempt to
formally evaluate the relevance of the search results, certainly not all of the titles returned are
relevant. However, a great number of them will be of use to scholars of all levels, even if the full
Page 16 of 27
September 10, 2009
text of the item is not available. The “Find in a library” function will help lead them to the item
or hopefully to the reference desk, if needed.
Open Content Alliance/Internet Archive
The first thing to note about OCA’s text content is that there is some overlap between it
and Google Books because some institutions (such as University of Illinois) deposit their
Google-scanned content into OCA. An initial search of OCA for media type = Texts AND
subject = music resulted in 2,562 items and the vast majority of it was in fact printed music. See
Table 3 for results.
[Insert table 3]
Perhaps the most notable score content to date is University of Illinois’ contribution of almost
100 opera and musical vocal scores including

The Mikado by Gilbert and Sullivan (with 2,247 downloads as of 6/8/09)

Have a Heart by Jerome Kern (with 261 downloads)

Die tote Stadt by Erich Korngold (with 84 downloads)
The NYPL has also deposited many scores (primarily Schirmer and Fischer editions), including
piano and violin music of Mozart, Brahms, Bartok, Tchaikovsky, Copland, Ibert, Schubert, Liszt,
Debussy, Chopin, Sarasate, and even Bruch’s 2nd violin concerto. Some of these are scores only,
but some are scores with parts. In one example OCA has seamlessly addressed the problem of
representing a score and part, with the cello part simply following the piano part with a few
clicks.42 There are fewer large-scale works included, but there is, for example, a miniature
(study) score of Holst’s The Planets.43
As in GBS, hymnals and opera libretti are well-represented in OCA. There are also a fair
number of music monographs, reference works, and even periodicals. A keyword search for
Page 17 of 27
September 10, 2009
“music” with media type = texts returned 4,214 items including several issues of Music
magazine (but you can’t tell from the results list which ones until you click on them), Oxford
Dictionary of Music (1950) 6th printing, (with 2,876 downloads), History of Arabian Music from
1929, and volumes from Grove first and second editions, but again, as in GBS, neither edition is
represented in full. The Boston Library Consortium has contributed a number of important
monographs under the direction of their music librarians, including Eitner and Fétis, and all
volumes of the Pazdírek Universal-Handbuch der Musikliteratur (which has no text in Google,
just citations for some of the volumes). Incidentally, it was not possible to find this work through
an author search if the diacritic mark over the “i” was omitted.
Keyword searching in OCA will have vastly different results than in GBS for several
reasons. See Table 4 below. There is simply less material in OCA and the full text of items is
not searched in OCA as it is in GBS (unless the search has been limited to one item), instead just
the item metadata is searched. Also, because OCA focuses more closely on texts in the public
domain, there are no records for contemporary materials as there are in GBS. Therefore, modern
topics such as “music business” and “rap music” will have few to no hits in OCA.
[insert table 4]
This is a good example of needing to choose the right tool for the job.
Google Scholar
A search for relevant titles from the Directory of Open Access Journals (DOAJ) in
Google Scholar, showed that of the thirty music titles, five (16.7%) were not represented at all
(and all five are foreign-language journals), twelve had ten or fewer results, and only four had a
significant number of results (one of which is Spanish language). In GS’s help pages,44 they
caution the user about searching for journal titles, as they are not consistently recorded. GS
Page 18 of 27
September 10, 2009
contains full text and/or direct links to vendor databases for over thirty major music journals,
including Psychology of Music, Computer Music Journal, Ethnomusicology, Journal of Research
in Music Education, Contemporary Music Review, and Journal of Popular Music Studies. There
are several music therapy titles, and this is likely because of its interdisciplinary nature and the
larger number of scientific and medical journal titles searched by GS.
GS also contains dissertations and conference proceedings including the Proceedings of
the International Computer Music Conference, the International Conference on Music Perception
and Cognition, and The International Society for Music Information Retrieval (ISMIR). It also
contains entries for patents via Google Patents, such as the one for the Music page score turner. 45
The following searches could not be limited to “subject” as was done in GBS and OCA,
which is the main reason the result sets are so much larger in GS than in the other tools. See
Table 5 for results.
[insert table 5]
The same searches were performed in Ebsco’s Academic Search Premier without applying any
limits. See Table 6 for results.
[insert table 6]
Conclusions
There is a lot of music-related material in Google Book Search, Open Content Alliance,
and Google Scholar, more so than might have been expected. While there is not as much printed
music as books about music or journal articles for various reasons, these tools can serve a useful
purpose for music scholars and others looking for music materials. For those patrons searching
for oft-checked-out materials, important historic monographs or journals, reference works that
don’t circulate, or interdisciplinary topics, these three tools are worth utilizing. They provide a
Page 19 of 27
September 10, 2009
level of access beyond what our library catalogs can (in terms of full-text searching), and with
the “Find in a Library” features in GBS and GS, can serve as a way to guide users to more
traditional resources, services, and reference help if needed.
Nevertheless, while these tools seem dazzling (especially to our users) because of their
vast amounts of content and apparent ease of use, they are not without their problems, especially
for serious scholars who have some familiarity with standard library research catalogs and
journal databases. First and foremost, metadata is problematic for both GBS and the OCA. It is
often difficult to determine what an item really is (Is it a score or a libretto? What edition is it?),
until viewing it. GS also contains very brief citations, often with misinformation, which can
make it hard to identify an item. In some ways these tools are best for known-item searching, and
at least for items available in limited or full view, especially historic texts (as you would expect
given copyright issues.)
Further problems with GBS include the enormous size of the database; even focused
searches can return too many hits to wade through practically. This is compounded by the fact
that sometimes what is described isn’t what has been scanned and some scans are unreadable
(blurring, gutter loss, partial obfuscation). The limitations of the searching and results
manipulation of all three tools have been mentioned previously.
Where GS, and to some extent GBS, will be of most use to music librarians and scholars
is for topics that go beyond individual journals or cross disciplines such as acoustics, music and
popular culture, music and media, music cognition, the psychology of music, and music and
marketing or consumerism. One example of this last topic is “Measuring the Effect of Music
Downloads on Music Purchases” from the Journal of Law and Economics,46 a prime example of
Page 20 of 27
September 10, 2009
the type of article often requested at the reference desk that wouldn’t likely be found in a
traditional music-centered journal database.
Of course the question many music librarians and scholars have is why there isn’t more
music material, especially printed music, in GBS and OCA. There are several factors at work—
the inability of OCR to handle musical scores and vastly inconsistent physical sizes of printed
scores (including scores with separate physical parts) being the two primary ones. Although
scanning can be outsourced for larger items, it requires money that could make this impractical
on a large-scale for some institutions (the out-sourcing itself can be quite cost-effective, but it
adds another layer to the logistics of the project). Another reason why more historic popular
sheet music (often the focus of music digital library projects because much of it is in the public
domain) isn’t included is because in most libraries there aren’t resources to fully catalog
individual titles. Both GBS and OCA require that items have some sort of an individual record in
an online library catalog that can be captured. The issue of coverage in Google Scholar is also a
question. Since GS does include some of the DOAJ music journals, why not all of them? It also
includes music journals via other sources such as JSTOR, Sage, Oxford, and so on, again, why
not all of them?
Further studies could more specifically compare the music content in Google Scholar to
music journal databases, although there is perhaps not yet enough relevant content to make this
worthwhile. Another possibility is to evaluate the academic value of audio music content in the
Internet Archive (and perhaps YouTube). Students and others have learned that YouTube can be
invaluable for locating performances of obscure or new compositions that may not be available
on commercial recordings.
Page 21 of 27
September 10, 2009
So where does this leave us in our evaluation of these tools for music scholars? Should
we turn away from traditional library tools in favor of these three tools? No. Are there times
when using these tools will be called for? Yes. Should we be more tolerant of students and
patrons using these tools? Yes. Do I hope that these tools will improve and incorporate more of
the features so valued by librarians? Yes. There are music scholars who are not yet aware of the
benefits to be gained from using these tools. While we as librarians may not recommend these
tools to them over our existing library resources, and online subscription tools, we should
maintain an awareness of what they have to offer and know when to suggest them to our users to
further their research.
Page 22 of 27
September 10, 2009
Appendix: Google Scholar Music Journals List








































Action, Criticism and Theory for Music Education (DOAJ)
British Journal of Music Education (via Cambridge journals*)
Computer Music Journal (via MIT press and ACM Portal and JSTOR)
Contemporary Music Review (via IngentaConnect/InformaWorld)
Critical Studies in Improvisation (DOAJ)
Early Music (via Oxford Press)
Empirical Musicology Review (DOAJ)
Ethnomusicology Forum (via Informaworld)
Ethnomusicology
International Journal of Community Music (DOAJ)
International Journal of Music Education (via Sage)
Journal of Music Theory (via JSTOR)
Journal of Music therapy (via PubMed)
Journal of New Music Research (via Informaworld and IngentaConnect)
Journal of Popular Music Studies (via Blackwell Synergy/Wiley InterScience)
Journal of Research in Music Education (via ERIC.ed.gov)
Journal of the Acoustical Society of America (via PubMed)
Journal of the Society for Musicology in Ireland
Journal of Voice (via Elsevier)
Leonardo Music Journal (via MIT Press)
Music and Letters (via Oxford Journals)
Music Education Research (via IngentaConnect/InformaWorld)
Music Educators Journal (via JSTOR)
Music Perception
Music Reference Services Quarterly (via Hawthorn/Ingenta)
Music Theory Online (DOAJ, mostly citations)
Music Theory Spectrum (via CALIBER/ JSTOR)
Music Therapy
Music Therapy Today (DOAJ)
Notes: Quarterly Journal of the Music Library Association (via JSTOR)
Perspectives in New Music (via JSTOR)
Philosophy of Music Education Review (via Muse and ERIC)
Popular Music & Society (via Ingenta)
Popular Music History
Popular Music
Psychology of Music (via Sage)
Psychomusicology (via PsycInfo)
Revista Musical Chilena (DOAJ)
TRANS: TRanscultural Music Review (DOAJ)
Voices: A World Forum for Music Therapy (DOAJ)
Page 23 of 27
September 10, 2009
*Note that sources may vary by institutional subscription availability.
Notes
1
Cathy DeRosa, “Perceptions of Libraries and Information Resources: A Report to the OCLC
Membership” (Dublin, OH: OCLC, 2005).
2
These include Petrucci Music Library http://imslp.org/ (accessed July 9, 2009), the Library of
Congress’s Performing Arts Encyclopedia, which contains many digitized pieces of sheet music
http://www.loc.gov/performingarts/ (accessed July 9, 2009), and other online sheet music projects
listed here: http://library.duke.edu/music/sheetmusic/collections.html (accessed July 9, 2009).
3
Karen Schneider, The Free Range Librarian blog, http://freerangelibrarian.com/2006/06/03/the-
user-is-not-broken-a-meme-masquerading-as-a-manifesto/ (accessed June 9, 2009).
4
Preservation digitization, on the other hand, focuses on producing as authentic a digital surrogate
of the original as possible. It may also increase access and discovery, but is often more timeconsuming and costly to accomplish.
5
Google Book Search Help, “Why can’t I read the entire book?”
http://books.google.com/support/bin/answer.py?answer=43729&cbid=1eznom41z7nze&src=cb&lev=in
dex (accessed June 9, 2009).
6
Tom Turvey (Google Book Search), “The Universal Collection,” (talk given at the CIC-CLI Off-the-
Shelf conference, Bloomington, IN, May 19, 2009).
7
Turvey indicated that ~60% of all content was from 1964-present and ~20% was from 1923-1963.
8
Kalev Leetaru, “Mass Book Digitization: The Deeper Story of Google Books and the Open Content
Alliance,” First Monday 13, 10 (October 6, 2008).
9
Ibid.
10
Open Content Alliance, “Contributors,” http://www.opencontentalliance.org/contributors/ (accessed
July 9, 2009).
Page 24 of 27
September 10, 2009
11
Betsy Kruger, University of Illinois at Urbana-Champaign, Head of Digital Content Creation and
Illinois’ Google Project Manager, email message to author, June 25, 2009.
12
Leetaru, 2008.
13
Ibid.
14
Ibid.
15
Google Scholar, “About Google Scholar,” http://scholar.google.com/intl/en/scholar/about.html,
(accessed June 9, 2009).
16
Burton Callicott and Debbie Vaghn, “Google Scholar vs. Library Scholar: Testing the Performance
of Schoogle,” Internet Reference Services Quarterly 10, 3/4 (April 2006): 71-88.
17
Google Scholar, “About Google Scholar,” http://scholar.google.com/intl/en/scholar/about.html,
(accessed June 9, 2009).
18
Jeffrey Young, “100 Colleges Sign Up with Google to Speed Access to Library Resources,”
Chronicle of Higher Education 51, 27 (May 20, 2005): A30.
19
Carol Tenopir, “Google in the Academic Library,” Library Journal February 1, 2005, 32.
20
Charles W. Bailey, Jr., “Google Book Search Bibliography, version 3” (December 8, 2008),
http://www.digital-scholarship.org/gbsb/ (accessed June 9, 2009).
21
Robert J. Lackie, “From Google Print to Google Book Search: The Controversial Initiative and Its
Impact on Other Remarkable Digitization Projects,” The Reference Librarian 49, 1 (August 2008):
35-53 and Leetaru, 2008.
22
Paul Duguid, “Inheritance and Loss? A Brief Survey of Google Books,” First Monday 12, 8 (August
6, 2007).
23
Jonathan B. Bengston, “The Birth of the Universal Library,” Library Journal (Spring 2006): 2-4, 6.
24
Joseph Grobelny, "Google Book Search, and: Live Search Books (review)," Notes 65, 1 (September
2008): 136-140.
25
Ibid, 139.
Page 25 of 27
September 10, 2009
26
John J. Meier and Thomas W. Conkling, “Google Scholar’s Coverage of the Engineering Literature:
An Empirical Study,” The Journal of Academic Librarianship 34, 3 (May 2008): 196-201 and Michael
Levine-Clark and Joseph Kraus, “Finding Chemistry Information Using Google Scholar: A
Comparison with Chemical Abstracts Service,” Science and Technology Libraries 27, 4 (August
2007): 3-17.
27
Chris Neuhaus, Ellen Neuhaus, Alan Asher, and Clint Wrede, “The Depth and Breadth of Google
Scholar: An Empirical Study,” portal: Libraries and the Academy 6, 2 (April 2006): 135.
28
Philipp Mayr and Anne-Kathrin Walter, “Studying Journal Coverage in Google Scholar,” Journal
of Library Administration 47, 1/2 (September 2008): 93-4.
29
Susan Gardner and Susanna Eng, “Gaga over Google? Scholar in the Social Sciences,” Library Hi
Tech News 22, 8 (2005): 42-5.
30
William H. Walters, “Google Scholar Search Performance: Comparative Recall and Precision,”
portal: Libraries and the Academy 9, 1 (January 2009): 10.
31
Walters, 16.
32
Marilyn Christianson, “Ecology Articles in Google Scholar: Levels of Access to Articles in Core
Journals,” Issues in Science and Technology Librarianship 49 (Winter 2007), http://www.istl.org/07winter/refereed.html (accessed June 9, 2009).
33
Peter Jacsó, “Google Scholar Revisited,” Online Information Review 32, 1 (2008): 102-114.
34
Ibid, 107.
35
Ibid, 110.
36
DOAJ journals belonging to subject “Music” http://www.doaj.org/doaj?func=subject&cpid=6
(accessed June 9, 2009).
37
Recent Researches in the Music of the Middle Ages and Early Renaissance, Recent Researches in
the Music of the Renaissance, Recent Researches in the Music of the Baroque Era, Recent
Researches in the Music of the Classical Era, Recent Researches in the Oral Traditions of Music, and
Recent Researches in American Music.
Page 26 of 27
September 10, 2009
38
The Official Google Blog, http://googleblog.blogspot.com/2008/12/search-and-find-magazines-on-
google.html (accessed June 9, 2009).
39
François-Joseph Fétis, Biographie universelle des musiciens: et bibliographie générale de la
musique, (Paris : Firmin Didot frères, fils et cie, 1878-81).
40
Robert Eitner, Bibliographie der Musik-Sammelwerke des XVI. und XVII. Jahrhunderts, (Berlin,
L. Liepmannssohn, 1877).
41
I could find the following volumes: 1890 v1 (2), 1890 index volume (2), 1889 v4 (2), 1890 v4,1911
v1,1911 v5, 1920 v6 “American Supplement” (2) as of February 2009.
42
W. H. Squire, At Twilight = Triste, (New York : C. Fischer, 1907),
http://www.archive.org/details/attwilighttriste00squi (accessed June 25, 2009).
43
Gustav Holst, The Planets: Suite for Large Orchestra, (London: Boosey and Hawkes, 1921),
http://www.archive.org/details/Holst_ThePlanets, (accessed July 9, 2009).
44
Google Scholar, “Advanced Scholar Search Tips,”
http://scholar.google.com/intl/en/scholar/refinesearch.html (accessed June 25, 2009).
45
Music page score turner, RW Edwards, PC Stavrou - US Patent 7,238,872, 2007; Portable page
turner for music sheets Douglas J. Carr et al, Patent number: 5203248 Filing date: Feb 25, 1992;
Page turner for music manuscripts and the like, Robert C. Burster, Patent number: 5052266 Filing
date: Apr 2, 1990.
46
Zentner, Alejandro. Measuring the Effect of Music Downloads on Music Purchases. Journal of Law
and Economics 49, 1 (April 2006): 63-90.
Page 27 of 27
September 10, 2009
Download