2010-08-silf-hangzho.. - Vrije Universiteit Brussel

advertisement
P. Nieuwenhuysen
Libraries in the age of the Internet: NO to obsolescence and YES to synergy
(research paper, presented in the session on Cloud Computing and Libraries)
Published:
In The Proceedings of the Fifth Shanghai (Hangzhou) International Library Forum = SILF2010 with the theme “City life and library service”
hosted by the Hangzhou library in Hangzhou, China, 24-27 August 2010, Shanghai Scientific and technological Literature Publishing
House, http://www.sstlp.com/
2010,
ISBN 978-7-5439-4415-2,
518 pp., pp. 452-463,
http://www.libnet.sh.cn/silf2010/english/index.htm
Libraries in the age of the Internet:
NO to obsolescence and YES to synergy
Paul Nieuwenhuysen
University Library
Vrije Universiteit Brussel = VUB
Pleinlaan 2
B-1050 Brussel
Belgium
Tel 32 2 629 2436
Paul.Nieuwenhuysen@vub.ac.be
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/professional/
Abstract
Introduction / background:
Libraries are active in a world where the Internet and the WWW offer information services that
increase in number, size and efficiency. Therefore it is increasingly important that libraries embrace
these expanding services in their work and activities. Realizing this efficiently in practice should be
based on investigations, assessments of services and methods.
Problem statements:
The purpose of this work was to evaluate the efficiency and effectiveness of embracing the Internet as
an additional tool to make information available to potential users. More concretely, we wanted to
find out if a purely informative WWW site developed as an information source can be discovered
effectively by end users, even though many other sites compete for visibility, mainly through WWW
search engines. Further outcomes should be recommendations on how to create an efficient WWW
presence and on how to assess the visibility or impact of a WWW site.
Methods:
A specific WWW site has been set up and developed in a particular subject domain. After stabilization,
several methods have been applied to analyze the visibility of the created WWW pages.
Results / findings:
The developed WWW pages are well visible and used. In particular, they can be discovered efficiently
using a WWW image search engine.
Discussion:
The various methods applied in the analysis yield a consistent view.
Conclusion / recommendations:
The positive results indicate that the view of the Internet as a difficult and highly competitive place to
offer information is realistic, but that this should not lead to pessimism and a passive attitude without
creative actions. With a simple approach, contributions to the Internet can reach users and be useful.
So an optimistic view is justified.
Recommendations are offered on how to optimize and how to assess the visibility of your WWW
pages.
Introduction / background / context
This conference contribution is a report of ongoing investigations by a scientific, academic librarian and
active user of scientific information.
To start with, the framework is presented. The broad aim is to find out how to adapt, change and
optimize the library services that we offer as an important part of the information landscape that is
evolving quite fast in this dynamic Internet age. Information can be found and accessed increasingly
through the Internet and WWW. As a consequence, nowadays librarians should evaluate, select, offer
and recommend information discovery services on the WWW to their clients. Therefore we assess, test
and evaluate the performance of public access information services that are offered through the
Internet and the WWW. Due to the constraints in the time available, we focus on a few information
services that seem valuable and important in the framework of libraries, so that they deserve to be
evaluated in a quantitative way. The aim is to provide a basis for decision making in the library
concerning the implementation of these services for our users. The background of this work is shaped
by information storage and retrieval systems through the Internet that have made spectacular
progress, while practical searching for information still confronts us with retrieval systems that are far
from perfect. The various investigations have in common the aims as mentioned, but also a welldefined subject domain, theme, topic of the information contents, which is the same in all cases:
1. In an evaluation of book search services on the WWW (Nieuwenhuysen, 2008, 2010a): the subject
domain of most of the book titles that are searched.
2. In investigations of WWW image searching (Nieuwenhuysen, 2010b): the subject domain of the
WWW images and documents that are searched.
3. In an investigation of how effectively a created WWW site can be discovered and used (this
report): the subject domain of that WWW site.
The common and narrow subject domain is probably not essential for the feasibility of this kind of
investigations, but an advantage is that it allows the investigator to exploit a relatively high level of
subject expertise and to increase this level even further during the ongoing investigation of the various
information systems. Not the subject itself, but the expertise in some (any) subject is probably a
necessity in order to carry out this kind of research in a reasonable, efficient and meaningful way.
Problem statements
1. A broad research area is how relevant classical library collections and services still can be, in
the face of the exploding quantity of information that people can discover and access through
the Internet and WWW, independent of libraries. This kind of investigation has become a
necessity since the birth of the Internet and becomes even more relevant, since classical
printed documents from libraries are digitized and made searchable and available on the
Internet at an impressive speed. Here we want to report on our quantitative analysis in a very
specific subject domain. Information that is available by libraries or other players in the
information world can be made accessible through the Internet. However, can potential users
discover this digitized or born digital information easily and efficiently? If not, then efforts to
bring information to the Internet to enrich its contents make less sense. The answer to this
simple question and the optimal way to realize efficient projects in this way is not
straightforward, since billions of documents are already available and searchable, all
competing for a high ranking in the prominent Internet search engines.
2. The experience gained in tackling the problem above should yield the following types of
recommendations:
a) How to create a WWW site that is well visible?
b) How to analyze the visibility of a WWW site?
Nomenclature
In this text we use mainly the words



online “visibility” as synonym for “footprint” or “presence”
“to analyze” as a synonym for “to test”, “to assess”, “to evaluate”
“indicator” of website visibility, as a synonym for words used by other authors, such as “index”,
“measure”, “metric”, “proxy”
Methods and findings
The WWW site that is analyzed
The test subject domain for all investigations reported here is the domain of classical, ethnic, tribal art
objects created in Africa. A specific WWW site has been set up and developed over the years in the
chosen subject domain: http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/
This site is used now to test the feasibility of “exporting” information from an information intensive
organization to the Internet. Most of the site consists of a bibliography of books/monographs in the
chosen subject domain. This bibliography consists of classical WWW pages, each one about the books
published in a particular year. An example is http://www.vub.ac.be/BIBLIO/nieuwenhuysen/africanart/african-art-books-2009.html
Similar, competing WWW sites
Finding WWW sites that are similar or related to a known WWW page or site can be useful to discover
additional interesting information or --in the context of this paper-- to discover pages and sites that
compete with your own site. Then these competing pages can be inspected, for instance



to detect overlapping content so that duplication of efforts can be avoided,
to get ideas on how to improve your own site,
to compare the visibility of your site with the competing sites, by using methods described
below.
To identify similar sites in practice, a WWW search engine like Google web search can be exploited free
of charge, using the function that is offered for this purpose. The algorithms used to determine
similarity are not described, but Aguillo (2009) writes that “Google associates websites according to
their link pattern, assuming two pages are closer if the overlap between their in- and out-links is high.
This provides a hypertextual neighborhood…”
This method can be taken one step further by applying the system
http://www.touchgraph.com/TGGoogleBrowser.html as recommended by Espadas et al. (2008) and by
Aguillo (2009). This service first finds similar sites through Google and afterwards adds value by
visualizing those sites on the computer display


with their similarity relations, and
in clusters shown in different colours, that have been created on the basis of the subject of
their content; the number of clusters can be decided by the user.
The figure shows the identification of similar WWW sites. Some of these were inspected more closely
for comparison with the analyzed site. More concretely, several indicators have been determined as
described further in this paper.
Figure: WWW sites similar to the analyzed WWW site
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/.
Note that the WWW site URLs are not shown fully in this static screenshot, but they do show up in the
working system, when the cursor is moved over a URL. Furthermore colours are relevant, but of course
these cannot be appreciated in a version of the figure that is printed in black and white.
Visibility of the WWW site, as reflected by links received
The number of links from other WWW pages to a particular WWW page is related to the concept of
visibility, as explained for instance by Espadas et al. (2008), Aguillo (2009) and De Andrés et al. (2009).
Therefore we have also performed a link analysis. The number of links received may seem simple and
well defined, but in practice this is not so. Even in the simpler case of number of citations received by
scholarly publications, complications blur quantitative analysis and conclusions. Furthermore,
measuring the number of links received is not straightforward. In practice we have used two query
methods:


In normal, simple search mode, queries were submitted to the classical Google WWW text
search engine, simply searching for a part of the URL of the WWW site that is unambiguous,
within quotation marks. An example is “nieuwenhuysen/african-art”.
In “Advanced Search”|“Page-specific tools”|”Find pages that link to the page”, the same
queries were submitted. This corresponds to searching in the command mode with the link
operator. An example is link:nieuwenhuysen/african-art.
Google text WWW search was used to search for the part of the URL that occurs in all the pages that
form the analyzed WWW site: “nieuwenhuysen/african-art”. In various tests, this gave about 11000 or
12000 hits.
In “Advanced Search”|“Page-specific tools”|”Find pages that link to the page”, the same query was
submitted. This yielded in various tests about 400 or 1500 hits. This is a lower number, as expected.
A search for one page only, uses a query with "nieuwenhuysen/african-art/african-art-collectionmasks". This yields about 900 hits, which is a lower than when links are searched to any page of the
site, as expected.
In “Advanced Search”|“Page-specific tools”|”Find pages that link to the page”, the same query yields
about 200 hits. This is a lower number, as expected.
In “Advanced Search”|“Page-specific tools”|”Find pages that link to the page”, almost the same query
nieuwenhuysen/african-art/african-art-collection-masks.htm but with the file name extension .htm at
the end added explicitly, yields about 110 hits. This is a lower number, as expected.
The relevance of the number of hits found through “Advanced Search”|“Page-specific tools”|”Find
pages that link to the page” or in an equivalent way by using the link: operator, is higher than when a
simple URL query is submitted, because less links are found while these are similar to real, classical
citations.
For comparison, some queries have been executed to get an idea of the links to other WWW sites in
the same subject domain, as follows.


The query link:www.dapper.com.fr yields about 250 hits. This is the home page of one of the
top museums dedicated to tribal art, but it receives less links than our analyzed WWW site;
this is a pleasant surprise.
The query link:www.hamillgallery.com yields about 100 hits. This is the home page of a big
informative and commercial site dedicated to African art, that shows many photos of objects,
but it receives less links than our analyzed WWW site; this is again a pleasant surprise.
Visibility of the WWW site as reflected by Google PageRank
The Google PageRank of a WWW page is continuously calculated by the prominent company Google
that is specialized in searching, as an indicator of the importance and impact of this WWW page. The
PageRank data are then used by Google to improve the ranking of search results. PageRank values
range from 0 to 10. See for instance “PageRank” in Wikipedia. The PageRank value of a WWW page is
mainly based on the links received from external WWW pages. So determining the number of links
received (cfr. above) is related to inspecting the PageRank value.
We have determined the PageRank value of pages of our WWW site, as well as of other WWW pages
for comparison. To do this in practice we used PageRank Checker:
http://www.prchecker.info/check_page_rank.php This is “a free tool to check Google page ranking of
any web site pages”. Results are shown in table format.
Table: Google PageRank values (between 0 and 10; higher is better).
http://www.vub.ac.be/BIBLIO/
The home page of the VUB university library site gets a higher value than the underlying and much
more specific sub-site that is analyzed here. This is expected as the whole library site and the
starting page in particular receives many links from all over the WWW.
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/
The home page of the sub-site analyzed here gets a value that lies among values for various
underlying pages.
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-masks.htm/
The value for the page on masks is relatively high and this subject is indeed popular.
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-statues.htm
8
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-textiles.htm
The value for the page on textiles is relatively low and indeed this is a smaller page on a subject
that is not as popular as masks for instance.
http://users.telenet.be/african-shop/
The figure above on similar sites reveals this site as a near neighbor and thus as very similar and
worthy of a close look. This is the home page of a dealer in old African art.
http://www.brucefrankprimitiveart.com/
The figure above on similar sites shows this site as a near neighbor and thus as very similar and
worthy of a close look. It is the site of a dealer in old African art. Pages show photos of many
objects. Exceptionally its PageRank value could not be given by the system that was applied.
http://www.hamillgallery.com/
The figure above on similar sites shows this site. Thus it is worthy of a close look. Furthermore the
site is included in the WWW subject directory of the Open Directory Project. It is the homepage of
the large Hamill Gallery in Boston, USA, which sells African art objects. The site is well structured,
easy to use, large and informative, with many photos of objects. This makes it attractive for many
people interested in African art.
http://www.hamillgallery.com/MOSSI/MossiDolls/MossiDolls.html
This is an example of a page in the site of Hamill Gallery. The value of the PageRank is lower than
the value of the home page. This is understandable, because links made on external WWW sites
point normally to a home page and not to a more specific, underlying page.
http://www.hamillgallery.com/SITE/MasksandHeads.html
The PageRank value for the entry page to information about masks is relatively high. This
corresponds well with the case of our analyzed site, as described above.
http://www.tribal-art-auktion.de/en/home/
This is the home page of the famous auction house that is specialized in tribal art including of
course ethnic, African art. It is located in Germany and offers a very informative WWW site with
illustrated catalogs of previous and coming auctions and with results of passed auctions.
http://africa.si.edu/collections/index.htm
This is the home page of the top level, world famous National Museum for African Art, in
Washington, USA. As expected, the PageRank value is relatively high.
http://anthro.amnh.org/anthropology/databases/africa_public/africa_public.htm
This is a gateway page that gives access to the searchable database of African art and artifacts in
the great collection of the famous American Museum of Natural History in New York, USA.
3
4
5
4
2
/
5
3
4
4
6
4
http://www.dapper.com.fr/
This is the home page of the small top level museum on tribal, ethnic art in the center of Paris,
France, that organizes regularly top level exhibitions and that publishes well documented books
with contributions by the greatest experts, as catalogs to these exhibitions.
5
Comparing the values for the PageRank of the selected pages related to African art, shows that those
values fall in the range 2 to 6 out of 10. The PageRank value of the selected pages in the analyzed
WWW site are relatively high. This is satisfactory, perhaps even surprising, considering the fame of the
other organizations. This success corresponds well with the satisfactory results presented on retrieval
through image searching, which are presented below.
Visibility of the WWW site, as reflected by pages indexed for WWW searching
Formulated in a negative way, WWW pages that have not been indexed by a search engine do not
appear as a result of any query through that search engine. Formulated in a more positive way, the
inclusion of your WWW pages in the database index of WWW search engines forms a basis for a high
visibility in searches (see for instance Espadas et al. 2008).
Another fact is that already since a few years Google Web Search is the leading, most popular, general
WWW text search engine. The competing products offered by Yahoo! and Microsoft are less stable:
Microsoft has changed their search engine several times in recent years and both companies have
announced a cooperation in the form of a switch to one common WWW search engine database in
2010. Therefore we have checked if the pages of the analyzed WWW site have been indexed by this
Google search engine. In practice we have submitted the query
site:www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/
The search result shows the number of pages found and ideally this is close to the number of pages in
the site. We found that all or almost all the relevant pages of the analyzed WWW site have been
indexed by the classical WWW text search engine offered by Google. This is satisfactory as the whole
WWW is not indexed completely at all. The inclusion in the database index forms a basis for a high
visibility in searches.
In an analogous way, we have tried to assess the WWW image search engine of Google. However, this
turned out not to function in a straightforward way. The result showed a number much lower than the
thousands of images present in the WWW site. This was disappointing at first. However:
1. As a further test we submitted the same search query, but we included an extra word that
occurs in the WWW site, in what seems like an AND relation; in this way we expect an even
lower number of hits.
2. Surprisingly the number of hits did not decrease, but it increased.
3. This was tested with various other additional query words and the same pattern emerged.
All this indicates the following:


A simple test with one simple query was not working as well as in the normal, classical Google
web search, when we applied images.google.com. Here some clarification would be welcome.
The initial disappointment could be replaced by a more optimistic view, because many more
images of the analyzed WWW site have been indexed by images.google.com than could be
concluded wrongly by the first simple test.
Visibility of the WWW site, as reflected by image search results
A very direct aspect of visibility of a WWW site is the appearance of a page from the site in a WWW
search engine results list, in the case of a query with keywords in the subject domain of the site. This is
explained in more detail by Espadas et al. (2008). Visual information is important in the created WWW
site that is analyzed here, so that it is suitable to perform this analysis by searching for images on the
Internet. Several image search engines are available on the WWW. Google’s system has been applied
in this analysis, for several reasons, as explained in Nieuwenhuysen (2010b); summarized, its coverage
is good, it performs relatively well and it is very popular. Image searches have been carried out that
were not targeted only at this site, but in a way that most users search for information, not aware of
the existence of our particular analyzed site. Most searches were carried out with a query that consists
of 1, 2 or 3 words. Using only 1 word is not the best approach, because 1 word is not sufficient to
express a real information need; the relevance of retrieved documents will be low; in other words
precision would be low; furthermore ambiguity of meaning always hinders information retrieval and is
certainly important in the case of queries with only 1 word as a context is lacking. On the other hand,
using more than 3 words in a query can narrow down retrieved documents to just those that contain
those words, such as the pages in the WWW site analyzed here. Using only few words simulates
common usage of search engines well, because most users formulate short queries, as shown by
research that is reviewed for instance by Lewandowski and Höchstötter (2008) and Machil et al.
(2008).
Afterwards, the result set for each query was inspected for occurrences of hits that point to the
analyzed WWW site.
The famous and popular Google Web Search can apply personalization in the sense that Google can
store earlier queries and other user behavior on a Google server computer or as a cookie on the client
computer; then Google can take this “older” information into account in presenting “newer” retrieval
results to the user. So a test of retrieval like the one carried out here can be influenced or hindered by
this mechanism. This personalization can be excluded by the user as explained by Google on their
WWW site. The Google WWW site shows no indication that Google Image search also works with
personalization. Anyway:
 Signing-in to Google with user-id was avoided.
 Some queries were repeated on a separate, independent client computer, working with a
different IP network address and a different user id, to check if similar results were obtained.
This was indeed the case. The images and corresponding pages that were retrieved from the
analyzed WWW site ranked remarkably high. Is this perhaps still due to some kind of weak
personalization? This could be based for instance on the IP address used. as this corresponds in
many cases to a particular country. A problem here is that it is not completely made clear to
users how exactly the popular search engines function.
Google Image Search gives results in the form of small images (named thumbnails), each one
annotated with the corresponding URL of the WWW page that contains the original image. In this
analysis, we noted the rank of the first thumbnail image in such a result set, which originates from the
analyzed WWW site. In this way, a low number reflects successful retrieval. The findings are shown in
the Table.
Query
Basalampasu mask
Salampasu masks
Basalampasu masks
Bamana iron mask
Bamana iron masks
Bambara iron mask
Bambara iron masks
Bambara iron
Bambara ntomo mask
Bambara ntomo masks
Bambara ntomo
Mask Kanaga
Suku mask
Namji doll
doll Namji
doll dowayo
singiti Hemba
Mossi Boulsa mask
Boulsa mask
mask Boulsa
masque Boulsa
mask Toussian
Toussian mask
Toussian masque
Bamana iron
mask Suku
Dowayo doll
Hemba singiti
masque Toussian
masque Toussian 2
books African art
Ngere mask
Google Image search
rank
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
3
Suku hemba
sculpture Hemba
Wee mask
Bamana ntomo masks
Kanaga dogon
Hemba sculpture
African art books
Bamana ntomo mask
Namchi doll
doll Namchi
Kanaga
mask Wee
mask Ngere
Dogon kanaga
Kanaga mask
Kanaga masque
Guere mask
Bamana ntomo
mask Guere
African masks
African masks 2
Fante doll
Fante doll 2
Baule blolo
Baoule blolo
Salampasu mask
Salampasu mask 2
Salampasu mask 3
Salampasu
doll Mossi
Dogon door lock
doll Fanti
doll AND Fanti
biiga Mossi
puppet Mossi
Fanti doll
Fanti doll 2
biiga
Basalampasu
4
4
4
5
5
5
5
7
8
8
9
9
9
10
10
10
11
14
23
28
30
1 of 600
1 of 18000
1 of 18000
1 of 280
1 of 280
1 of 400
1 of 400
1 of 400
1 of 700
2 of 10000
2 of 16000
2 of 700
2 of 700
3 of 200
3 of 7000
4 of 700
4 of 700
7 of 3500
To check reproducibility, some queries were submitted not just once, but twice, after a few minutes or
after a few weeks. These are indicated in the Table with “2”. The results were satisfactory.
We observed that the sequence of words in the query can have a significant influence on the search
results (see examples in the Table). Blakeman (2010) noticed this also. We find that the ranking within
results can be different, as well as the number of results. Most users are not aware of this and do not
notice it in their applications. The phenomenon can complicate quantitative investigations and it can
be exploited in practice to obtain alternative results with the same query words.
The data of our analysis the Table show that the WWW image search engine presents a page from the
test site among the first 20 retrieved pages that are shown on the first page of results, for almost all
queries. The significance of these observations depends of course on the total number of hits given by
the search engine. For instance, if the total number of hits is lower than 20, then it would be expected
and meaningless to see that the WWW page that contains the query words features among those
pages. Therefore it must be mentioned here that the total number of hits was generally much higher
than the rank number given in the Table. For instance:


The query ‘Salampasu mask’ gives a hit from the analyzed WWW site, which is ranked number
1, while more than 400 hits are reported by Google image search.
The query ‘Dogon door lock’ gives a hit from the analyzed WWW site, which is ranked number
2, while more than 16 000 hits are reported by Google image search.
In conclusion, this has demonstrated that the contents of our specific analyzed WWW site can be
discovered quite well through image searching.
Visibility of the WWW site through subject directories
When a subject directory on the WWW has selected a WWW page or site and has included a
descriptive entry about it, then this indicates an appreciation by the creators of the directory.
Furthermore this increases the web presence and visibility of the included site in several ways:



Users can discover and access the site through the directory.
Search engines based on crawlers can easily discover the included site and will probably
harvest them and include the contents in their search engine database.
The existence of the link in the directory probably increases the Google PageRank or analogous
values used in other search engines.
Therefore it was checked if the analyzed WWW site has been included in some famous online subject
directories. This can be considered as an important part of the more general link analysis presented
above.
The classical Yahoo! general subject does not include the site. Probably the site must be manually
suggested online to Yahoo! for consideration and human editorial review, to have any chance of being
included. (cfr. ‘Search engine optimization’ in Wikipedia).
The general subject directory Open Directory Project (ODP) http://www.dmoz.org/ includes a link to
the WWW site http://www.vub.ac.be/BIBLIO/nieuwenhuysen/ that includes the analyzed WWW subsite.
The general Google Directory http://www.google.com/dirhp is based on data copied from the Open
Directory Project, but shows these in a different sequence that is based on the Google PageRank of the
included WWW sites. Indeed, this directory shows the same link as ODP. Furthermore, in the section of
the directory where the analyzed site is included, the site ranks among the top 3 already for a few
years.
The general academic, scholarly subject directory of the University of California, named ‘Infomine’,
includes an entry about one of the WWW pages that are included in the analyzed WWW site:
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-links.html
The general academic, scholarly subject directory created in the United Kingdom, named ‘Intute’, does
include the entry http://www.intute.ac.uk/cgi-bin/fullrecord.pl?handle=artifact7726 about a particular
page of the WWW site, namely the page devoted to African masks
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-masks.htm. The fact
that this page was selected corresponds well with the high value for PageRank, that is observed for this
page, as reported elsewhere in this paper.
The subject directory of a university library in the USA includes also a link to the analyzed WWW site:
http://library.agnesscott.edu/help/subjects/african_art.htm
Besides the links from directories, a link is also present in the popular free encyclopedia Wikipedia:
http://it.wikipedia.org/wiki/Maschere_tradizionali_africane
In conclusion, the analyzed WWW site has been discovered, appreciated and included in several
significant academic subject directories. Therefore the visibility through subject directories on the
WWW is pleasing.
Visibility of the WWW site, as reflected by usage
Many statistical data related to usage of the WWW site can be collected and inspected in various ways,
as explained for instance by Espadas et al. (2008) and Aguillo (2009). To realize this in practice, several
WWW based statistical analysis systems have been incorporated in most pages of the analyzed WWW
site and have been used over the past years, all free of charge. Of those systems, one that has become
available recently is Google Analytics http://www.google.com/analytics/ . This system has been
evaluated as relatively powerful and easy to use by many colleagues (see for instance Aguillo, 2009)
and also by myself. So this system was applied to collect the data related to usage, which are reported
here.
During the last months of 2009 and the first months of 2010, usage was fairly constant. On average,
during a week 2000 to 3000 visitors were counted, who looked at 1-2 pages. Almost half of the visits
came from the USA and only 3 % from Africa; this reflects of course not only interest in African art, but
also the penetration of Internet technology into the population.
One WWW page received about half of all visits and these lasted on average 3 minutes:
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-masks.htm.
The usage analysis has revealed also that the sources of usage are mainly WWW search engines (about
90 %) and that most of the search queries that generate a visit include the words “African masks”.
Usage of the pages that offer bibliographical information about books published on African art is lower,
as expected, but many of these pages do receive about 10 visits per week.
A comparison with usage of other sites is not straightforward, as data can be collected from Google
Analytics only by the webmaster of a site.
From these statistical usage data and from the more individual, personal contacts with users of the
informative analyzed WWW site, we conclude that the site is successful enough to justify its
maintenance and further development.
Discussion
The various aspects of the analysis yield various indicators and these shape a coherent, consistent
picture. For instance, one particular page of the site receives most attention and usage,
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/african-art/african-art-collection-masks.htm and this
agrees with various considerations and observations that are reported above:






This page offers relatively rich information content in the form of text as well as images.
The information provided is not only interesting for amateurs of African art, but also for
children and others who want to make a mask for instance.
The number of hyperlinks received from other pages on the WWW is relatively high.
The Google PageRank value of this page is relatively high.
This page and not the whole site or another page has been selected and included by the
subject directory ‘Intute’.
This page received about half of all visits, as shown by Google Analytics. Also the sources of
usage are mainly WWW search engines and most of the search queries that generate a visit
include the words “African masks”.
Conclusions and recommendations
1. The results show the following. The view of the Internet as a difficult and highly competitive place to
offer information is justified, but this should not lead to pessimism and a passive attitude without
actions. Even with a simple and cheap approach, contributions to the Internet can reach users and be
useful, even by users who use common, simple search methods. So an optimistic view is justified for all
information providers and libraries in particular. The fact that the WWW site built by the author is well
visible and used indicates that this aim can be reached by more professional and well funded digital
libraries that hire specialized personnel to develop their site. Ideally, good digital libraries should come
to the WWW and should take a more prominent rank in the lists of search results. Authors, publishers,
librarians who add information to the WWW that includes a significant part of visual information,
should ideally do this in such a way that image searching can be used as one of the possible methods
to unlock the information sources.
2. a. Corresponding to this conclusion, it is recommended that other webmasters build and improve
their sites in similar ways, i.e. according to most of the guidelines that are published in the form of
articles, books and documents on the WWW, most of which have the words “Search Engine
Optimization” or the abbreviation “SEO” in their title. Examples are



the books by Thurow (2003) and Lieb (2009)
the brief, concentrated, clear Google’s Search Engine Optimization Starter Guide (2008) that is
available online free of charge
the brief list given by Espadas et al. (2008)
The focus of such guides is primarily on search engines, but most guidelines are also applicable and
useful to increase the quality of the real user experience. The following offers directly a selection of
guidelines; these are relatively easy to apply and probably efficient:

Offer unique content or services.

Consider creating pages in English or another important language that is used by your target
audience, instead of the language used in the country where the WWW site is created and
hosted.

Host the site on a server in a well respected organization, that is harvested regularly by search
engines, that functions fast and quasi-continuously.

Ideally create an HTML-title for each page that is unique, clear, brief, descriptive, significant,
and accurate.

See that each page corresponds with a clear, brief, descriptive and user-friendly URL.

Place your web pages in a simple, hierarchical folder/directory structure that is easy to
navigate by users.

Use mostly text for navigation (and not drop-down menus, images or animations).

Avoid deep nesting of subdirectories.

See that a user can navigate successfully by removing a part from the URL, in the hopes of
finding more general contents.

Each anchor text (the text that you use to give users an idea about the target of a link) should
be brief but clear, descriptive and significant, and not generic and meaningless.

Format links in such a way that they are easy to spot. Users should be able to distinguish
between regular text and anchor text.

Use HTML-heading tags appropriately.

Use a WWW site format that does not hinder visibility. The analyzed WWW site consists of
static web pages linked together. So the site belongs to the well visible web and not to the
obscure, invisible web that is not covered and harvested well by the popular, classical WWW
search engines (Sherman and Price, 2001). Storing information as a digital library in a database
management system, often named content management system, may not be the optimal
method in this context. In that case, visibility will depend on the particular management
software used.

Offer significant content on your pages. In the analyzed WWW site, most pages are much
longer than can be displayed on a screen and includes a lot of information. In other words, the
information that is offered is not scattered over numerous smaller pages, which would lead
probably to a lower PageRank for each individual page.
Some web development guidelines are specific for images:

Offer each image as a separate file, and not hidden in a larger container file in a format such as
the popular Adobe PDF or Microsoft Word DOC or DOCX or Microsoft PowerPoint PPT or PPS
or PPTX.

Use a file format for each image file, which can be interpreted by common, popular Internet
browsing software.

Use a brief but meaningful, descriptive file name for each image file.
2.b. The visibility of the resulting WWW pages can be analyzed in various ways, as described for
instance in Espadas et al (2008), Aguillo (2009) and in this paper. Applying all methods together yields a
useful view on the performance in terms of visibility and presence of your WWW site. If such a view is
also needed for your web site, then it is recommended to perform a similar analysis.
References
Aguillo, Isidro
Measuring the institution’s footprint on the web.
Library Hi Tech, Vol. 27, No. 4, 2009, pp. 540-556. DOI 10.1108/073788309.
Blakeman, Karen
On the net: All change on the search front.
Online Magazine, March/April 2010, pp. 44-47.
De Andrés, Javier, Pedro Lorca, and Ana B. Martínez
Economic and financial factors for the adoption and visibility effects of Web accessibility: The case of
European banks.
Journal of the American Society for Information Science and Technology, Vol. 60, No. 9, pp. 1769-1780,
2009, DOI: 10.1002/asi.21103
Espadas, Javier, Coral Calero, and Mario Piattini
Web site visibility evaluation.
Journal of the American Society for Information Science and Technology, Volume 59, Issue 11,
September 2008, pp. 1727-1742. DOI 10.1002/asi.20865.
Google’s Search Engine Optimization Starter Guide (2008) [online]
Available free of charge from: http://google.com/ in the form of one PDF file.
Lewandowski, D., and N Höchstötter (2008)
Web Searching: A Quality Measurement Perspective.
In Web Search. (edited by A. Spink and M. Zimmer) Berlin Heidelberg : Springer, 978-3-540-75828-0
(Print) 978-3-540-75829-7 (Online), DOI 10.1007/978-3-540-75829-7_16, pp. 309-340.
Lieb, Rebecca (2009)
The Truth about Search Engine Optimization.
Que Publishing, ISBN 0789738317, 9780789738318, 208 pp.
Machill, Marcel, Markus Beiler and Martin Zenker (2008)
Search-engine research: a European-American overview and systemization of an interdisciplinary and
international research field.
Media, Culture & Society, Vol. 30, No. 5, pp. 591-608.
Nieuwenhuysen (2008)
Internet federated search engines for bookseller databases: a comparative evaluation.
In Intelligence, Innovation and Library Services, Proceedings of the Fourth Shanghai International
Library Forum = SILF2008 = Shanghai Library, Shanghai, China, October 20-22, 2008. Shanghai Scientific
and Technological Literature Publishing House, 2008. 371 pp.
ISBN 978-7-5439-3671-3, pp. 340-348.
Nieuwenhuysen (2010a)
Printed books and the WWW.
In the proceedings of the annual BOBCATSSS international conference on library and information
science, in 2010 hosted by the Universita degli Studi di Parma, Italia/Italy, 25-27 January 2010.
Available online free of charge from: http://dspace-unipr.cilea.it/handle/1889/1273 or
http://hdl.handle.net/1889/1273 PDF file, 9 pp.
Nieuwenhuysen, Paul (2010b)
Information retrieval via WWW image searching: a reality check.
In the proceedings of the 2010 International Conference on Information Retrieval and Knowledge
Management, CAMP’10 “Exploring the Invisible World”, at the Shah Alam Convention Centre, in
Malaysia, 16-18 March 2010, hosted by the Universiti Technologi MARA and supported by the IEEE
Computer Society, edited by Zainah Abu Bakar et al., 2010, pp. 73-78.
PageRank. [online]
Available free of charge from: http://en.wikipedia.org/wiki/PageRank
Search engine optimization. [online]
Available free of charge from: http://en.wikipedia.org/
Sherman, Chris, and Price, Gary (2001)
The invisible web: uncovering information sources search engines can’t see.
Medford : Information Today, Cyberage Books, 2001, 439 pp.
Thurow, Shari (2003)
Search engine visibility.
Indianapolis : New Riders, 2003, 297 pp.
Author / presenter:
Since 1983, Paul Nieuwenhuysen is a full-time member of the academic staff at the Vrije Universiteit
Brussel, nowadays as professor.
These days his functions include: member of the management board of the University Library, science
and technology librarian, teacher of courses on online information retrieval and presentation.
At the University of Antwerp inter-university postgraduate program in Information and library science,
he has been guest professor responsible for courses on information technology and on the information
market until 2009.
At the University of Antwerp he received the degrees of Licentiaat in Physics in 1974, Doctor in Science
in 1979, the Belgian post-doctoral degree in 1983, and the inter-university postgraduate degree in
“Documentation and library science” in 1986.
He has been project leader of a 10 year co-operation with the National Agricultural Library of Tanzania
and he organizes international training courses on management of information in science and
technology.
He is single author or co-author of more than 40 refereed publications in international
scientific/technical journals and conference proceedings. In the area of information science, he has
been a consultant for various international agencies, and he is a member of several societies, of the
program committee of several international conferences, and of the editorial board of several
academic and professional journals.
Download