Appendix A: Definition of the Nine Document Qualities

advertisement
The Institutional Dimension of Document Quality Judgments
Bing Bai
Rutgers University. New Brunswick, NJ 08903. Email: bbai@paul.rutgers.edu
Kwong Bor Ng
Queens College, CUNY. Kissena Boulevard, Flushing, NY 11367. Email: kbng@qc.edu
Ying Sun
Rutgers University. New Brunswick, NJ 08903. Email: ysun@scils.rutgers.edu
Paul Kantor
Rutgers University. New Brunswick, NJ 08903. Email: kantor@scils.rutgers.edu
Tomek Strzalkowski
SUNY Albany, Western Avenue, Albany, NY 12222. Email: tomek@csc.albany.edu
In addition to relevance, there are other factors
that contribute to the utility of a document. For
examples, content properties like depth of
analysis and multiplicity of viewpoints, and
presentational properties like readability and
verbosity, all will affect the usefulness of a
document. These kinds of relevance-independent
properties are difficult to determine, as their
estimations are more likely to be affected by
personal aspects of one’s knowledge structure.
Reliability of judgments on those properties may
decrease when one moves from the personal level
to the global level. In this paper, we report several
experiments on document qualities. In our
experiments, we explore the correlation of
judgments on nine different document qualities to
see if we can generate fewer dimensions
underlying the judgments. We also investigate the
issue of reliability of the judgments and discuss
its theoretical implications. We find that, between
the global level of agreement of judgment and the
inter-personal level of agreement of judgment,
there is an intermediate level of agreement, which
can be characterized as an inter-institutional level.
For example, novelty of idea (Harman 2002; Cronin, Blaise
and Hert, 1995), easy of use and currency of information
(Malone, Debbie and Videon, 1997), timeliness and
completeness of data (Strong, Lee, and Wang, 1997), etc.
All these properties can have impact on the overall
usefulness of a document. Among different document
properties, we are particularly interested in two types of
document properties: properties that are related with the
content quality of a paper (e.g., depth of analysis,
multiplicity of viewpoints, etc.), and properties that are
related with the style or presentational quality of a paper
(e.g., readability, verbosity, etc.).
Overview
In our previous studies (Ng et al 2003, Tang et al 2003),
we have identified nine document qualities deemed
In addition to topical relevance of a document, there are
other factors that contribute to the utility of a document.
In studies which intend to elicit users' criteria in real-life
information seeking, many non-topical document properties
are found crucial to users' selecting of documents.
Schamber (1991), after interviewing 30 users, identifies ten
groups of criteria: accuracy, currency, specificity,
geographic proximity, reliability, accessibility, verifiability,
clarity, dynamism, and presentation qualities. Barry (1994)
describes experiments that showed that users consider many
factors, including some which are non topical, i.e.., depth,
clarity, Accuracy, recency, etc. Bruce (1994) identifies
some "information attributes" (accuracy, completeness,
timeliness ...) that information users might use in their
relevance judgments.
important by information professionals. They are shown in
table 1 (see Appendix A for the definitions):
Table 1: Nine Document Qualities
Variable
Description
ACC
Accuracy
SOU
Source reliability
OBJ
Objectivity,
DEP
Depth
AUT
Author/Producer credibility
REA
Readability
VER
Verbose  conciseness
GRA
Grammatically Correctness
ONE
One-Sided  Multi-Views
Compared to determining topical relevance of a
document, it can be argued that these nine document
qualities are more difficult to determine consistently, as
their estimation is more likely to be affected by one’s
personal cognitive style and the personal aspect of
knowledge structure.
We have the following speculation about the consistency
of judgment of these document qualities. That is, there are
two factors affecting document quality judgments: (1) Some
commonly-agreed-upon knowledge which is relatively more
persistent and stable across different people, and (2) some
idiosyncratic and personal knowledge which has a relatively
higher variance across different people. Consistency of the
judgment is achieved when the contribution of commonlyagreed-upon knowledge to the judgment is the dominant
factor in the decision making process; while inconsistency
is due to dominant contribution of idiosyncratic and
personal knowledge. Based on the above speculation, we
expect the degree of inter-judge agreement on document
qualities to decrease when one moves from the personal
level to the global level. However, in our experiments, we
find an intermediate level of agreement between the global
level of agreement and the inter-personal level of
agreement. This intermediate level can be characterized as
an inter-institutional level.
In the following, we will report several analyses on the
reliability of document quality judgments. We explore the
correlation of judgments of nine different document
properties, to see if there are fewer dimensions underlying
the judgments. We also investigate the issue of reliability of
judgment and discuss its theoretical implications.
Introduction
The major concern in information retrieval is topical
relevance. With the increase of the size and diversity of the
corpora, it is now much easier than before to retrieve
relevant documents for a topic or query. As one commonly
experiences in Internet searching, sometimes there are
simply too many relevant documents retrieved. Therefore,
criteria other than topical relevance, like document
qualities, become more and more important in providing
addition constraints to filter the retrieved documents.
However, unlike relevance judgment, for which there exist
several document/query collections available for
researchers to explore, each with hundreds of topics and ten
of thousands relevance judgments corresponding to the
topics (e.g., TREC, see Voorhees, 2001), there are no
similar collections with document quality judgments. To
investigate the nature and characteristics of various
document qualities, we have built several small document
collections, each document in the collection was judged
according to the 9 document qualities. The collection sizes
are not comparable to the TREC collection, but adequate
for us to run some preliminary experiments.
The first collection (from now on it will be denoted as C1)
was built in 2002 summer (Tang et al 2002). It has 1000
medium-sized (100 words to 2500 words) news articles,
extracted from the TREC collection. They consist of
documents originally from LA Times, Wall Street Journal,
Financial Time, and Associated Press. The second
collection (from now on it will be denoted as C2) was built
in 2003 summer. It has 3 types of documents: (1) 1100
documents from the Center for Nonproliferation Studies;
(2) 600 articles from the Associated Press Worldstream,
the NY Times, and Xinghua News (English), and (3) 500
documents from web. All documents in the collections were
judged with regard to the 9 document qualities using a
Likert scale of 1 to 10. Each document was judged once at
each of two different sites (one at SUNY Albany and one at
Rutgers). We recruited two kinds of participants to perform
judgments, experts and students. Expert sessions were
completed first, and documents judged by experts were
used for training student judges to perform quality
assessment at the same level as expert judges. The expert
participants were experienced news professionals and
researchers in the areas of Journalism.
This work produced 1000 pairs of quality vectors for C1
and 2200 pairs of quality vectors for C2 (a vector consists
of the nine quality variables, each with values equal to the
quality scores assigned by Judge from one institution,
SUNY Albany or Rutgers).
In the work described elsewhere (Ng et al 2003), we used
the fact that two judges had assessed every document to
produce a combined quality score for the document, and a
measure of confidence in that score, which was based on
the agreement between the judges. This formed the basis
for a detailed analysis of the possibility of computational
estimation of these quality scores, based on linguistic
features of the texts. That result led to the general
conclusion that it is feasible to model, with algorithms, the
scores assigned by any one judge. We were more
pessimistic about the possibility of modeling the average
score. In the work reported here, we looked further into
possible explanations of our difficulty. In particular, we
moved from a set of nine variables, representing the
average scores, to a full set of eighteen variables,
representing the nine scores assigned to a document by each
of two judges.
Correlations Among Documents Qualities
We noticed that the results from Rutgers and SUNY
Albany are quite different. The quality scores assigned by
the two judges from the two institutions tended to vary.
In particular, the correlations of judgments assigned at
the two schools are very low, while the correlations
between the scores for different qualities at the same school
are relatively high, as shown in Figure 1. Note that Figure 1
is based on Collection C2. Collection C1 generates graphs
with very similar curves, demonstrating the same kind of
relationship.
Another unexpected fact is that the correlations between
the different quality judgments in each school are very
similar. Figure 2 shows the comparison between the
qualities of different schools.
Figure 1. Correlations between Rutgers judgments and SUNY judgments (Collection C2). The quality names with suffix “1”
are Rutgers scores, the others are SUNY scores. The x axes are all the qualities from both schools, the lines in the upper
graph show the correlations of Rutgers quality scores for all document qualities, the lines in the lower graph show the
correlations of SUNY scores for all document qualities.
correlation of RU scores with RU and SUNY scores
1.2
1
acc1
0.8
s ou1
obj1
0.6
dep1
aut1
rea1
0.4
ver1
gra1
0.2
one1
0
acc1 s ou1 obj1 dep1 aut1
rea1 ver1 gra1 one1
acc
s ou
obj
dep
aut
rea
ver
gra
one
-0.2
correlation of SUNY scores with RU and SUNY scores
1.2
1
acc
s ou
0.8
obj
dep
aut
0.6
rea
ver
0.4
gra
one
0.2
0
acc1 s ou1 obj1 dep1 aut1 rea1
-0.2
ver1 gra1 one1
acc
s ou
obj
dep
aut
rea
ver
gra
one
Figure 2. Patterns of the qualities correlations
Correlations between AUT and Other Qualities
Correlations between ACC and Other Qualities
1.2
1.2
1
1
0.8
0.8
0.6
Rutgers
SUNY
0.6
Rutgers
SUNY
0.4
0.4
0.2
0.2
0
ACC SOU OBJ DEP AUT REA VER GRA ONE
0
ACC
SOU OBJ DEP AUT REA VER GRA ONE
Correlations between REA and Other Qualities
Correlations between SOU and Other Qualities
1.2
1.2
1
1
0.8
0.8
0.6
Rutgers
SUNY
0.6
Rutgers
SUNY
0.4
0.4
0.2
0.2
0
ACC
0
ACC
SOU OBJ DEP AUT REA VER GRA ONE
Correlations between VER and Other Qualities
Correlations between OBJ and Other Qualities
1.2
1.2
1
1
0.8
0.8
0.6
Rutgers
SUNY
0.6
SOU OBJ DEP AUT REA VER GRA ONE
Rutgers
SUNY
0.4
0.4
0.2
0.2
0
ACC SOU OBJ DEP AUT REA VER GRA ONE
0
ACC
SOU OBJ DEP AUT REA VER GRA ONE
Correlations between GRA and Other Qualities
Correlations between DEP and Other Qualities
1.2
1.2
1
1
0.8
0.8
Rutgers
SUNY
0.6
0.4
Rutgers
SUNY
0.6
0.4
0.2
0.2
0
0
ACC
ACC
SOU OBJ DEP AUT REA VER GRA ONE
SOU OBJ DEP AUT REA VER GRA ONE
Continuation of Figure 2
words, a score difference as large as 9 (i.e., one judge
assigned 10 and the other assigned 1) between 2 judges will
be treated exactly the same as a score difference as small as
of 1 (e.g., one judge assigned 5 and the other assigned 6.)
To compensate for this effect, we use the weighted Kappa
statistic implemented by SAS. The results are summarized
in table 2.
Correlations between ONE and Other Qualities
1.2
1
0.8
Rutgers
SUNY
0.6
0.4
0.2
0
ACC SOU OBJ DEP AUT REA VER GRA ONE
As we can see from the 9 graphs in Figure 2, each
document quality has very specific relationships with the
other 8 qualities. Such relationships are independent of the
school, as the two schools have the same patterns for all the
nine graphs, even though the correlations of quality scores
between the two schools are very low, as shown in Figure 1.
To understand this phenomenon, we use a Kappa test
(Cohen 1968) and analysis to analyze the level of
agreement between the two institutions. We then use factor
analysis to explore the hidden or underlying dimensions.
Since our design provided for one judge at each institution,
this separation provides an opportunity to look for an
unexpected effect of the institution on the results of the
judging process. It is not uncommon, in research on
information retrieval, to treat each of several aspects as a
separate variable and to model it on its own. However in
great deal of other work in communication, and in the
social sciences generally, it is more usual to search for a
small set of underlying factors which might explain all of
the measured variables together. This approach was used in
our discussion of the nine averaged variables in our
previous papers, and it will be applied here to the larger
data set.
Kappa Test
The Kappa statistic is a way to evaluate the extent of
agreement between raters. In a typical two-rater and nresponse-category experiment, the data can be represented
in an nn contingency table. The Kappa statistic can then
be estimated by calculating the ratio of the observed
agreement minus chance to the maximum possible
agreement minus chance.
In our case, the two raters are judges from SUNY Albany
and Rutgers and n=10 (i.e., the 10 possible scores assigned
by a judge to a document.) However, if we use the above
ratio to estimate Kappa, the statistic will entirely depend on
the diagonal elements of the observed contingency table
and not depend on any off-diagonal elements at all. In other
Table 2. Weighted Kappa between SUNY Albany and
Rutgers for all comparable quality pairs, results given by
SAS 8.0, with Cicchetti-Allison weight type (Cicchetti
1972), see Appendix C for the detail.
Collection
C1
C2
Sample Size
8527
18669
Weighted Kappa
Statistics
0.1480
0.1592
ASE
0.0072
0.0049
95% confidence
limits
0.1339 /
0.1620
0.1497 /
0.1688
In Table 2, “Sample Size” is the number of all the quality
pairs excluding the ones with score “0”. “ASE” is the
asymptotic standard error, which represents a large-sample
approximation of the standard error of the kappa statistic.
The 95% confidence interval provides lower and upper
bounds that supposedly contain the true value of the kappa
statistic with a 95% chance. When weighted Kappa statistic
is below 0.2 for both collections, the inter-institutional
agreement is very low, even though we can reject the null
hypothesis that there is no agreement between the two raters
with 95% confidence. This is consistent with the fact we
presented earlier.
Factor Analysis
Factor analysis is the combination of a rigorous
mathematical concept with a set of rules and procedures
that have grown up over years of practice. The rigorous
mathematical concept is that the full set of variables can be
manipulated to produce an eighteen-by-eighteen matrix,
which is regarded as a linear operator in an abstract vector
space. This matrix can then be reduced to a canonical form
in which it is diagonal, and its eigenvalues are in decreasing
order. If a selection of these eigenvalues is made which
does not encompass all of them, it amounts to selecting a
particular subspace of the space defined by the eighteen
variables. The selection is made in descending order of the
size of the eigenvalues because the eigenvalues also
measure the amount of variance in the observed variables
that is explained by the factors.
The selection of some of the eigenvalues amounts to
selecting a subspace, and after that the investigator is free
to choose a basis set in that subspace. For purposes of our
discussion, it is easiest to use the orthogonal basis defined
as a result of the first eigenvalue decomposition.
The first analysis we did was an 18-variable factor
analysis, the 18 variables are 9 quality scores from Rutgers
and 9 from SUNY. The method we used was PCA, the
matrix was correlation matrix. In Table 3 and 4, we show
the first 6 factors and their contribution to the 18 variables.
Table 3. First 6 factors extracted by factor analysis of 18
variables (PCA on correlation matrix, SPSS 11.5).
Factor
Eigenvalue
% of
variance
% of Cumulative
variance
1
4.283
23.794
23.794
2
2.89
16.083
39.877
3
1.476
8.203
48.080
4
1.080
6.000
54.079
5
.980
5.442
59.521
6
..925
5.139
64.661
From Table 3, we see that the first 3 factors are
responsible for about 48% of the total variance; the first 6
are responsible for about 65% of variance. Factor 1 can be
considered as “General goodness” of the documents. It is
positive for all the qualities, judged at SUNY Albany or at
Rutgers. Factor 2 can be considered as the “Institution
Separator”, on which all the RU scores have positive
weight, while all the SUNY scores have negative weight.
Factor 3 can be considered as the “Quality Separator”, on
which RU and SUNY have the same sign for same quality.
Some qualities such as “Depth”, “Objectivity” and “Author
Credibility”, not only have the same sign, but also similar
amplitude. Figure 3 shows the distribution of the qualities
on the graphs of these 3 components.
Figure 3. 18 variables regression scores on the first 3
factors. The method is PCA on the correlation matrix. C1,
C2 and C3 are first 3 components. (a), (b) and (c) are the
views from different perspective. The graphs are made by
SPSS 11.5 with a format change to make it easier to view.
Figure 3a
Table 4: The component matrix of the first 3 factors. The
quality variables with “_R” are the quality scores of
Rutgers, others are the quality score of SUNY Albany
Quality
Cmpnt 1
Cmpnt 2
Cmpnt 3
ACC_R
0.536
0.524
-0.0133
SOU_R
0.539
0.481
-0.0464
OBJ_R
0.543
0.401
-0.0942
DEP_R
0.410
0.266
-0.252
AUT_R
0.507
0.435
-0.0744
REA_R
0.478
0.406
0.308
VER_R
0.368
0.379
0.421
GRA_R
0.322
0.415
0.212
ONE_R
0.524
0.304
-0.275
ACC_A
0.572
-0.504
-0.0973
SOU_A
0.617
-0.375
-0.213
OBJ_A
0.549
-0.510
-0.0914
DEP_A
0.427
-0.433
-0.285
AUT_A
0.554
-0.309
-0.0988
REA_A
0.512
-0.371
0.490
VER_A
0.337
-0.373
0.577
GRA_A
0.362
-0.356
0.387
ONE_A
0.488
-0.234
-0.355
Figure 3b
Figure 3c
In Figure 3(a) and 3(b), we see that RU scores are above
the plane C2=0, while SUNY scores are below that plane.
In Figure 3(c), we look in the direction of negative C2, and
can see that the same qualities from different schools are
quite close. RU scores are obtained if we move the SUNY
scores in the C2 direction, with small noise added. In other
words, the C1 and C3 components match very well for the
same quality between the two schools, while C2
components are quite different. This fact confirmed the
information shown in Figure 1. The more independent the
scores from the two schools, the larger the distance between
them in the direction of C2. The more similar the
correlation patterns inside each school, the closer their
position in the directions other than C2. Please see
appendix B for a closer look at this fact.
The reason for such a big gap between the two schools is
not clear yet. One of the possible reasons is that the
instructions given to the human judges in two schools are
differently interpreted.
Another question is that, since the data from Rutgers and
SUNY are quite different, if we are going to use the data
for machine learning, to construct automatic classifiers for
document qualities, what should we use in the training
data? A straight forward idea is to use the average of them.
By averaging the scores from Rutgers and SUNY, we have
9 averaged quality scores. We also did the factor analysis
on these 9 variables. Figure 4 shows a comparison between
the 18-variable factors and 9-average-variable factors. We
took the first 6 factors out of each analysis, respectively.
For 18-variables, the 6 factors have a cumulative variance
of 64.7% while the 6 factors of 9-average-variable account
for 85.9%.
Figure 4. The comparison between 18-variable factors and
9-average-variable factors. The x and y axes of each cell
are the factor scores of two factor analysis, respectively.
From Figure 4, we see that the following pairs of factors
have very strong correlations: 18-F1/9-F1, 18-F3/9-F2, 18F5/9-F3, 18-F6/9-F4. Factor 2 and factor 4 of 18-variable
analysis don’t have strong correlations with any of the 9average-variable factor. We speculate that these two factors
represent the independent factors between Rutgers and
SUNY scores and are cancelled by the averaging operation
in the 9-average-variable analysis.
Conclusion
As demonstrated in the above analyses, there is a
dimension in the data that separates the document quality
judgments of SUNY Albany from the document quality
judgments of Rutgers. This dimension is perpendicular to
other dimensions in such a way that its existence won’t
affect the configurations of data points in other dimensions.
That explains why the two institutions have the same
patterns of internal relationship of the nine document
qualities (as shown in Figure 2) even though they don’t
have a high correlation with each other. The dimension of
institutional difference is still puzzling us. If there is no
difference in the instrument and in the data collection
procedure, then it may indicate some kind of structural
influences of cultural background on document quality
judgments. We expect to do more research in this direction
to understand this phenomenon.
ACKNOWLEDGMENTS
This research was supported in part by the Advanced
Research Development Activity of the Intelligence
Community, under Contract # 2002-H790400-000 to
SUNY Albany, with Rutgers as a subcontractor
Appendix A: Definition of the Nine Document
Qualities
Accuracy: The extent to which information is precise and
free from known errors.
Figure 5. The scatter plot of factor analysis of 2 random
variables, sample size is 1000. X axis is factor 1, Y axis is
factor 2. Let N(x,y) represents the gaussian distribution
with mean x and variance y, the two random variables in
(a), (b) and (c) are respectively: (a) r1 = N(0,9), r2 =
N(0,9), r1 and r2 are independent. (b) r1 = N(0,9), r2 =
r1*0.5 + N(0,9)*0.5, (c) r1 = N(0,9), r2 = r1*0.8 +
N(0,9)*0.2. r1 and r2 are more and more correlated in a,
b and c.
Figure 5a
Source Reliability: The extent to which you believe that
the indicated sources in the text (e.g., interviewees,
eye-witnesses, etc.) provide truthful account of the
story.
Objectivity: The extent to which the document includes
facts without distortion by personal or organizational
biases.
Depth: The extent to which the coverage and analysis of
information is detailed.
Author/Producer Credibility: The extent to which you
believe that the author of the writing is trustworthy.
Readability: The extent to which information is presented
with clarity and is easily understood.
Verbose  Conciseness: The extent to which information
is well-structured and compactly represented.
Grammatical Correctness: The extent to which the text is
free from syntactic problems.
One-sided  Multi-views: The extent to which
information reported contains a variety of data
sources and view points.
Appendix B: The Component Plot of The
Factor Analysis of Two Random Gaussians
The more independent the scores from the two school, the
larger the distance between them in the direction of C2. The
more similar the correlation patterns inside each school, the
closer their position in the directions other than C2. To see
this, we made Figure 5, in which we show the component
scatter plot of two variables with different correlations
between them. As illustrated, the more independent, the
closer the positions of the variables to the diagonal; the
more dependent, the closer they are to the horizontal axis.
Figure 5b
Figure 5c
As mentioned in the main text, the weight function we
used was the default Cicchetti-Allison coefficient.
REFERENCES
Barry, C. L. (1994). User-defined relevance criteria: An
exploratory study. Journal of the American Society of
Information Science, 45(3):149-159.
Bauin, S., & Rothman, H. (1992). "Impact" of journals as proxies
for citation counts. In P. Weingart, R. Sehringer, & M.
Winterhager (Eds.), Representations of science and technology
(pp. 225-239). Leiden: DSWO Press.
Borgman, C.L. (Ed.). (1990). Scholarly communication and
bibliometrics. London: Sage.
Bruce, H. W. (1994). A cognitive view of the situational
dynamism of user-centered relevance estimation. Journal of the
American Society for Information Science, 45(3):142-148.
Buckland, M., & Gey, F. (1994). The relationship between recall
and precision. Journal of the American Society for Information
Science, 45, 12-19.
Appendix C: The Weight Type of Weighted
Kappa Method
The idea of Weighted Kappa test is: first build a
frequency table of the two raters, the two dimensions are all
the possible values of the judgments by two raters. For
example, cell (3,4) contains the number of times that rater 1
give score 3 while rater 2 give 4. Apparently, we consider
the cells closer to the diagonal to be the ones represent
better agreements. The Weighted Kappa formula is:
r
Kw 
r
r
r
n wij xij  wij mi n j
i 1 j 1
i 1 j 1
r
r
n   wij mi n j
,
2
i 1 j 1
where
mi is the sum of ith row, n j is the sum of jth
column, r is the number of possible score values, n is the
number of total samples. wij is the weight of the number in
cell (i,j).
There are two ways to assign the weights in SAS 8.0. The
first one, also default in SAS, is called Cicchetti-Allison
kappa coefficient, it is defined as:
wij  1 
i j
max  min
, max and min are the maximum and
minimum of the possible scores. The second way is called
Fleiss-Cohen kappa coefficient, which is defined as:
(i  j ) 2 .
wij  1 
(max  min) 2
Cicchetti DV. (1972). A new measure of agreement between rank
ordered variables. Proceedings of the American Psychological
Association, 1972, 7, 17-18.
Cohen J. (1968) Weighted kappa: Nominal scale agreement with
provision for scaled disagreement or partial credit.
Psychological Bulletin. 70:213-20, 1968.
Cronin, Blaise and Carol A. Hert. "Scholarly Foraging and
Network Discovery Tools." Journal of Documentation 51
(December 1995): 388-403
Harman, D. (2002) Overview of the TREC 2002 Novelty Track.
NIST. TREC 11 Proceedings.
Hoppe, K., Ammersbach, K., Lutes-Schaab, B., & Zinssmeister,
G. (1990). EXPRESS: An experimental interface for factual
information retrieval. In J.-L. Vidick (Ed.), Proceedings of the
13th International Conference on Research and Development in
Information Retrieval (ACM SIGIR '91) (pp. 63-81). Brussels:
ACM.
Kling, R. & Elliott, M. (1994). Digital library design for usability.
Retrieved
December
7,
2001
from
http://www.csdl.tamu.edu/DL94/paper/kling.html.
Malone, Debbie and Carol Videon. "Assessing undergraduate use
of electronic resources: a quantitative analysis of works cited
(study of ten undergraduate institutions in the greater
Philadelphia area." Research Strategies 15 (1997): 151-158.
Ng, K.B., Kantor, P., Tang, R, Rittman, R., Small, S., Song,
P.,Strzalkowski, T. Sun, Y. & Wacholder, N. (2003)
Identification of effective predictive variables for document
qualities. Proceedings of 2003 Annual Meeting of American
Society for Information Science and Technology. Medford, NJ:
Information Today, Inc.
Schamber, L. (1991). User's criteria for evaluation in a
multimedia environment. In Proceedings of the American
Society for Information Science, 126-133, Washington, DC.
Strong, D. M. , Lee, Y. W. and Wang, R. Y. Data Quality in
Context. Communications of the ACM, 40(5):103--110, May
1997.
Tang, R., Ng, K.B., Strzalkowski, T, & Kantor, P. (2003). Toward
machine understanding of information quality. Proceedings of
2003 Annual Meeting of American Society for Information
Science and Technology. Medford, NJ: Information Today, Inc.
Voorhees, E. (2001). Overview of TREC 2001. In E. Voorhees
(ed.) NIST Special Publication 500-250: The Tenth Text
REtrieval Conference, pp. 1 – 15. Washington, D.C.
Download