SearchRelevancyTesting_1

advertisement
1. Semantic relevancy seems to be going too deep, finding "related" individuals it
shouldn't be considering or weighting
Example 1, search term: natural language
The first result is an article by Rick Bonney, an ornithologist. The article has nothing to do with
natural language (either the programming or linguistics sense) and Rick Bonney has done no
other research on natural language. However, he is the principal investigator on numerous
National Science Foundation grants, and in the "awards grant" section of the individual page for
the NSF, the word "language" appears 38 times and "natural" 33 times.
The second result is also an article, "Control of the Supply Line," by Jeffrey Warren Roberts.
Again, it doesn't have anything to do with natural language, and the word "natural" appears only
twice on his individual page, "language" not at all. But he has over 80 publications. If you follow
the relationship from the published articles to the journals in which they were published, you can
find numerous occurrences of the word "natural." Even more occurrences of the search terms
(individually) can be found by traversing the relationship to all of his co-authors. Is this what's
happening with the semantic relevancy?
Note: the first search result relating to natural language and programming appears on page 11 of
the results.
Example 2, search term: renewable energy
3 of the first 4 results, and 7 of the first 10, have no clear connection to renewable energy, and
only 5 of the first 25 results have the term in their rdfs:label. (For comparison, 16 of the results
on page 4 of the results have the search term in their rdfs:label.)
The first result is an article, "Revitalizing Labor in Today's World Markets," by Lowell Turner,
who's a professor in International and Comparative Labor. The article is not about renewable
energy and Turner's profile page contains neither "renewable" nor "energy," though it does
include "renewal." The source of his ranking doesn't appear to be either his publications or his
grants. Besides his department, however, he is also a member of 3 graduate fields. Could the
semantic search be extending through his colleagues and their publications and grants?
The third result is also an article, "Are Replacement Animals an Opportunity Area for Your Dairy?"
and it is not about renewable energy. The author, John Conway, was the principal investigator for
6 grants funded by the NY Farm Viability Institute. The NYFVI has awarded numerous grants with
the word "energy" in the title (though none with "renewable"). None of those grants were
awarded to John Conway.
The fourth result is the article "The Welfare Reform Bill and its Effects in the South," published in
the Journal of Southern Agriculture. The article has no connection with renewable energy, but one
of the article's authors, Gerald B. White, is a co-principal investigator on a grant titled, "A Passive
Solar Heating System for Commercial Greenhouses." The principal investigator for that grant,
Louis D. Albright, has numerous renewable energy publications and grants.
2. Stemming Issues
Example 1, search term: biotech
The search results give the full word, "biotechnology," more weight than the actually search term.
As a result, individuals with "biotech" in the rdfs:label (such as "Biotech and the Poor," "Biotech
Symposium, Biotech Vegetables for Insect and Insect-Vectored Disease Management") don't
begin appearing until page four of the results.
Example 2, search term: lyme disease
The first search results page looks good, but the second page brings in individuals with "disease"
but no apparent relationship to lyme disease: for example, "Plant Disease" and "Reducing the
Impact of a new Coleus Disease." Some results seem to be included because they combine the
word "disease" with the word "polymer." These include such individuals as the National Science
Foundation, the Journal of Dairy Science, and the Horticulture graduate field.
3. People's names
For people's names, it looks like the total number of individual occurrences of the first or last
name takes precedence over an occurrence of the actual full name (“first last” or “last, first”.)
Example 1, search term: robert johnson
Robert Johnson, the track coach, is 13th search result; Robert L Johnson, the research support
specialist, is the 16th.
Example 2, search term: david smith:
Lewis Auditorium (Goldwin Smith Hall) is the first result. It seems to rank higher than any actual
David Smiths because its individual page includes more occurrences of the words "smith" and
"david."
4. Other anomalies
Example 1, search term: molecular medicine
The third result is the Music graduate program, and the tenth result is the Music department.
Also, the "52nd Annual American Society of Tropical Medicine and Hygiene Meeting" is ranked
higher (21st) than nine molecular medicine seminars (29th - 36th). This seems a bit counterintuitive.
Example 2, search term: avian influenza
The first page of results looks good, but there seems to be some weighting issues. For example,
the ”Society for Risk Analysis Annual Meeting,” which included only one presentation on avian
influenza, ranks higher than the research grant, “HUMAN DIMENSIONS OF HIGHLY PATHOGENIC
AVIAN INFLUENZA.”
Download