Uploaded by Julian Callan

WEBQUAL: A MEASURE OF WEBSITE QUALITY

advertisement
WEBQUAL: A MEASURE OF WEBSITE QUALITY1
Eleanor T. Loiacono, Worcester Polytechnic Institute, Worcester
Richard T. Watson, University of Georgia, Athens
Dale L. Goodhue, University of Georgia, Athens
ABSTRACT
The paper presents WebQual™, a Web site quality measure with 12 dimensions. Development
was based on an extensive literature review, and interviews with Web designers and Web
visitors. The instrument was refined using two successive samples, and the validity of the final
instrument was tested with a third confirmatory sample.
INTRODUCTION
There does not exist a comprehensive instrument specifically designed to focus on the
consumer’s perception of Web site quality in the context of predicting the behavior of reuse of
the site. This paper seeks to address that gap, utilizing as a general underlying model the Theory
of Reasoned Action (TRA) (Ajzen et al. 1980; Fishbein et al. 1975), and particularly the TRA as
applied to information technology utilization in the Technology Acceptance Model (TAM)
(Davis et al., 1989).
The Theory of Reasoned Action (Ajzen et al. 1980; Fishbein et al. 1975) states that individuals'
behavior (in this case revisiting or purchasing from a Web site) can be predicted from their
intentions, which can be predicted from their attitudes about the behavior and subjective norms.
Following the chain of prediction further back, attitudes can be predicted from an individual's
beliefs about the consequences of the behavior. Subjective norms can be predicted by knowing
whether significant other individuals think the behavior should or should not be done. TRA is a
very general theory, and as such does not specify what specific beliefs would be pertinent in a
particular situation.
Davis (1989) applied TRA to a class of behaviors that can be loosely defined as “using computer
technologies,” and produced a Technology Acceptance Model (TAM). Davis argues that for the
behavior of “using computer technologies,” two particular beliefs are predominant in predicting
behavior: perceived ease of use and perceived usefulness. We also believe that there may be
Loiacono, Eleanor T., Richard T. Watson, and Dale L. Goodhue. 2002. WebQual™: a measure
of Web site quality. In AMA Winter Conference. Austin, TX.
1
1
multiple distinct dimensions of ease of use and of usefulness, as well as other categories of
beliefs such as “entertainment” that together predict intentions to reuse a Web site. Determining
the relevant specific dimensions of WebQual, and developing an effective instrument to measure
those is the subject of the rest of the paper.
There are many frameworks for thinking about measurement validity. Bagozzi (1980) and
Bagozzi and Phillips (1982) are used in this paper due to their comprehensive coverage of six
key components of validity, which are explained along with the process used to develop
WebQual.
INSTRUMENT DEVELOPMENT PROCESS
The goal was to develop a valid measure of Web site quality that would predict Web site reuse.
The overall process included four stages, which together addressed each of Bagozzi’s validity
concerns.
1. We move beyond and within the two constructs of ease of use and usefulness. That is, we
examined whether there are other categories of beliefs that also need to be considered, and
whether there are there distinct dimensions of “ease of use” and “usefulness” that should be
considered separately.
2. We developed questions for each of the dimensions of WebQual identified in Stage 1. The
initial result was an eighty-eight-item instrument that measured 13 distinct beliefs about a
Web site.
3. The instrument was refined by administration to two different samples (N = 511 and N
=336). After each administration, the measurement validity of the constructs was analyzed
and problem questions pruned, revised, or replaced and redundant dimensions collapsed. This
resulted in an instrument with 36 questions measuring 12 dimensions of WebQual.
4. A confirmatory analysis of the overall measurement validity of the final instrument was
conducted using a new sample of 307 subjects. The instrument demonstrated strong
measurement validity for the four validity issues that can be empirically assessed.
STAGE 1: DEFINING THE DIMENSIONS OF WEBQUAL
In order to reveal the pertinent dimensions of Web site quality and establish content validity, a
four-pronged effort was employed more or less simultaneously. First, a review of the MIS and
marketing literature revealed existing constructs related to quality and customer satisfaction. In
parallel with this effort, we conducted three exploratory research projects to ensure the
comprehensiveness of the constructs. These included soliciting criteria from Web surfers,
interviewing Web designers, and studying a large organization's standards for Web site design.
Initial dimensions of Web site quality
Five general categories of Web site quality arose from a literature review and exploratory
research: ease of use, usefulness, entertainment, complementary relationship, and customer
2
service. These can be further broken down into 14 distinct dimensions of Web site quality as
shown in Table 1.
----Table 1 about here----
STAGE 2: DEVELOPING THE ITEMS
Scale development can be either deductive or inductive (Hinkin 1998). We incorporated both
approaches through an extensive literature review (inductive) and an exploratory research
(deductive) phases.
An initial set of 142 candidate items was developed based on 13 constructs arising out of the
literature review and our exploratory studies. This list of items was then refined based on an
approach used by Davis’ (1989) in his pretest of measures for the TAM. Mindful of the cognitive
complexity of handling all 13 constructs (Miller 1956),we opted to reduce the difficulty of this
initial screening. Twenty experienced Web users from a large southeastern University (5
graduate and 15 undergraduates students) rated the items on how well they corresponded to the
four high-level categories of Web site quality (ease of use, usefulness, entertainment, and
complementary relationships).
A non-statistical cluster analysis (similar to Davis, 1989) was performed by incorporating an
item into one of the four high-level constructs if at least fifty percent (10 out of 20) of the
subjects ranked the item as one of the top three for these particular constructs. This resulted in an
initial WebQual instrument of 88 items covering a possible 13 constructs.
STAGE 3: REFINEMENT
In order to prevent item order bias, two random order versions of the initial instrument were
created. To preventing inflating reliabilities from artificially high correlations where subject
answered adjacent questions using anchoring and adjustment, items pertaining to a similar
construct were separated from other items. Items were measured using a seven-point Likert
scale. In addition, reverse scored items were included to ensure respondents were alert while
completing the survey and to eliminate response bias (Hensley 1998; Spector 1992).
The instrument was refined by examining its reliability and discriminant validity after each of
two distinct administrations. With 88 questions in the initial questionnaire and a rule of thumb
for factor analysis of at least five times as many observations as there are variables to be
analyzed (Hair et al. 1998), at least 440 subjects were required. Data were collected from 510
undergraduates in round 1.
Subjects were given a context (e.g., “Imagine it is your friend's birthday and you are searching
for a good gift—a book.”) and told to explore a designated Web site as if they were considering
which book to buy for their friend, and then to complete the questionnaire. Three of each of four
different types of Web sites were used (12 in all).The sites were chosen for their quality
variability, based on rankings of specific Web sites generated by subjects in the exploratory
research phase. In order to control for time of day bias, the time of day the sites were visited was
varied.
3
Item assessment and purification: round 1
Data analysis and purification consisted of the following three steps. First, the Cronbach's alpha
for each measure of the 13 target constructs was calculated. Items that were determined to
decrease the reliability (alpha) of a construct’s measure were deleted and the process continued
until no item’s removal increased a construct’s overall alpha. The end result was the removal of
eleven items.
As a second means to identify internal consistency problems, those items found to possess low
correlations with similar traits (i.e., less than .40) were removed from the instrument. A total of
13 items were deleted during this phase, while one was simply modified in order to clarify its
meaning.
The final step consisted of removing items that appeared to have discriminant validity problems.
Items were removed if they correlated more highly with items measuring different constructs
than they did with items in their intended construct (Campbell et al. 1959; Goodhue 1998).
Under these criteria, eight items were deleted.
After the deletions, each construct was reviewed to ensure that at least five items per dimension
remained. (This permitted us to drop up to two items for each dimension, if we discovered
measurement validity problems, and still had at least three items per dimension.) For those
dimensions that were underrepresented, additional items closely related to the remaining items
were added. Twenty-seven items were added—resulting in an 83-item instrument.
STAGE 4. FINAL ITEM SELECTION AND ASSESSMENT OF MEASUREMENT
A second round of data collection allowed testing of the measurement validity of the second
version of the instrument. Data were collected from 336 undergraduate students. A two-step
process was employed to select the subset of the questions to be included in the final version of
WebQual.
Discriminant validity: round 2
Discriminant validity for the second version of the questionnaire was first assessed using
exploratory factor analysis (EFA). Five of the 13 constructs (information quality, fit-to-task,
interactivity, innovativeness, and business processes) appeared to have some possible
discriminant validity problems. All other constructs loaded on separate factors.
To explicitly test for discriminant validity of the five problematic constructs, we used
confirmatory factor analysis (CFA) and chi-squared difference tests. The results revealed that the
information quality and fit-to-task constructs should be combined. Thus, the outcome of
discriminant validity analysis was to reduce the WebQual measure from 13 to 12 constructs.
Validity of final instrument
Once the 12 key concepts of WebQual were identified, the top three items loading on each factor
were chosen for the final questionnaire. Since a construct should have at least three items
4
(Cronbach et al. 1955) and we sought a parsimonious instrument (lengthy questionnaires
typically have a lower response rate) (Babbie 1998), three items were selected for each construct.
This final version of 36 items was used to assess the empirically testable validity concerns
(Bagozzi's last four concerns from Table 1).
Confirmatory factor analysis
A final confirmatory factor analysis was run using a sample of 307 undergraduate students. The
results indicate strong support for the overall fit of the model. Both the RMSEA (0.060) and
SRMR (0.053) are within conservative limits of acceptable error. Further the RNI (.92) and
NNFI (.91) confirm good model fit.
Internal consistency (reliability)
The reliability of the final questionnaire (12 constructs with 3 questions each) was calculated
using Cronbach's alpha (Cronbach 1951). The alphas of the twelve constructs ranged from .72 to
.93, with 10 of the 12 constructs having an alpha greater than .80. Of the two remaining
constructs, one had an alpha of .79, and one had an alpha of .72
Discriminant validity
As a final check of discriminant validity, we tested all possible pairs of the 12 remaining
constructs to see if fit was improved when any pair was collapsed into a single construct. The
two most highly correlated constructs are information fit to task and interactivity, correlated at
.90. Even though these two are highly correlated, the discriminant analysis confirms that all 12
constructs are separate dimensions of a Web site's quality.
Convergent validity
Following the example of the development of SERVQUAL (Parasuraman et al. 1988), subjects
were asked one additional question on overall Web site quality—Overall, how would you rate
the quality of this Web site? (1 to 7 point scale anchored on “poor” and “excellent”). The total
WebQual score (36 items) was computed and then compared to the overall quality of the website
question. The two were correlated at .78 (p < .001) indicating convergent validity.
Nomological validity/predictive validity
Confidence in the measure increases, if it behaves as expected in relation to other acceptable
constructs (Bagozzi 1979; Bagozzi 1980). In the case of WebQual, predictive validity is
demonstrated by testing the ability of the instrument to accurately predict a Web visitor’s
intention to purchase from or revisit a Web site. The correlations between the composite
WebQual measure and intention to purchase and intention to revisit were .56 and .53 respectively
(both with p< .0001). This is good confirmation of nomological validity.
5
Adequacy of model fit
Four recommended fit indices (Vandenberg et al. 2000) indicate quite acceptable fit for the 12
construct, 36 question model (see Table 2). These indices provide consistent and reinforcing
indications of the overall adequacy of WebQual.
----Table 2 about here----
DISCUSSION
This research makes two contributions. First, it provides practitioners and researchers with a
validated reliable measure of Web site quality. The term “electronic commerce” may soon be
redundant since nearly all commerce may be electronic (Porter, 2001). Even with slightly less
optimistic growth of electronic commerce, firms will increasingly need a means of assessing the
quality of a Web site. Web sites in many cases will fashion the customer’s view of the firm
(Watson, 1998) and could have an important impact on performance. Second, this study adds to
our understanding of TAM, a widely used MIS instrument, by revealing the components of ease
of use and usefulness. Thus, it provides the basis for refining TAM to increase its diagnostic
power.
LIMITATIONS
WebQual’s development was based on the responses of undergraduate business students to a
selected group of Web sites. While these subjects are typical of a substantial body of Web users,
they are not a representative sample of all users. Furthermore, many of the subjects were not ongoing customers of the sites selected for assessment. These important limitations are typical of
those facing most instrument developers because such work often needs to start in an
environment where many subjects are readily and repeatedly available. Further confirmatory
research needs to be done with broad samples of on-going customers of a range of Web sites.
CONCLUSION
In the age of the Internet and electronic commerce, MIS and marketing need a means of
assessing the effectiveness of a Web site. Our efforts have produced, we believe, a valid and
reliable instrument for measuring Web site quality. WebQual should be able to support a range
of important MIS and marketing studies as researchers attempt to understand what contributes to
success in the electronic marketspace. The research presents the beginning of a cumulative
research program by the authors, and we hope others, to understand the nature and characteristics
of high quality Web sites.
REFERENCES
Ajzen, I., and Fishbein, M. Understanding attitudes and predicting social behavior PrenticeHall, Englewood Cliffs, NJ, 1980.
Babbie, E. The Practice of Social Research, (8th ed.) Wadsworth, Belmont, CA, 1998.
6
Bagozzi, R.P. “The role of measurement in theory construction and hypothesis testing: Toward a
holistic model,” in: Conceptual and theoretical developments in marketing, s.W.B. O.C.
Ferrell, & C.W. Lamb, Jr. (ed.), American Marketing Association, Chicago, IL, 1979.
Bagozzi, R.P. Causal models in marketing John Wiley, New York, 1980.
Bagozzi, R.P., and Phillips, L. “Representing and Testing Organizational Theories: A Holistic
Construal,” Administrative Science Quarterly (27:3) 1982, pp 459-490.
Campbell, D.T., and Fisk, D.W. “Convergent and discriminant validation by the mutlitraitemultimethod matrix,” Psychological Bulletin (56) 1959, pp 81-105.
Cronbach, L.J. “Coefficient alpha and the internal structure of tests,” Psychometrika (16:3) 1951,
pp 297-333.
Cronbach, L.J., and Meehl, P.C. “Construct validity in psychological tests,” Psychological
Bulletin (52) 1955, pp 281-302.
Davis, F.D. “Perceived usefulness, perceived ease of use, and user acceptance of information
technology,” MIS Quarterly (13:3) 1989, pp 319-339.
Fishbein, M., and Ajzen, I. Beliefs, attitude, intention, and behavior: An introduction to theory
and research Addison-Wesley, Reading, MA, 1975.
Goodhue, D. “Development and measurement validity of a task-technology fit instruments for
user evaluations of information systems,” Decision Sciences (29:1) 1998, pp 105-138.
Hair, J.F., Jr., Anderson, R.E., Tatham, R.L., and Black, W.C. Multivariate Data Analysis, Upper
Saddle River, NJ, 1998, p. 730.
Hensley, R.L. “A review of operations management studies using scale development
techniques,” Journal of Operations Management (17) 1998, pp 343-356.
Hinkin, T.R. “A brief tutorial on the development of measures for use in survey questionnaires,”
Organizational Research Methods (1:1) 1998, pp 104-121.
Hoffman, D.L., and Novak, T.P. “Marketing in Hypermedia Computer-Mediated Environments:
Conceptual Foundations,” Journal of Marketing (60:July) 1996, pp 50-68.
Miller, G.A. “The magical number seven, plus or minus two: some limits on our capacity for
processing information,” The Psychological Review (63:2) 1956, pp 81-97.
Parasuraman, A., Zeithaml, V.A., and Berry, L.L. “SERVQUAL: a multiple-item scale for
measuring consumer perceptions of service quality,” Journal of Retailing (64:1) 1988, pp 1240.
7
Spector, P.E. Summated rating scale construction: An introduction Sage Publications Ltd.,
Newbury Park, 1992, p. 73.
Steiger, J.H., Shapiro, A., and Brown, M.W. "On the multivariate asymptotic distribution of
sequential chi-square statistics," Psychometrika (50) 1985, pp 253-264.
Vandenberg, R., J., and Lance, C.E. “A review and synthesis of the measurement invariance
literature: Suggestions, practices, and recommendations for organizational research,”
Organizational Research Methods (3:1) 2000, pp 4-69.
Watson, R.T., Akselsen, S., and Pitt, L.F. “Attractors: building mountains in the flat landscape of
the World Wide Web,” California Management Review (40:2) 1998, pp 36-56.
8
Appendix: WebQual Items by Construct
USEFULNESS:
Informational Fit-to-Task
The information on the Web site is pretty much what I need to carry out my tasks.
The Web site adequately meets my information needs.
The information on the Web site is effective.
Interactivity
The Web site allows me to interact with it to receive tailored information.
The Web site has interactive features, which help me accomplish my task.
I can interact with the Web site in order to get information tailored to my specific needs.
Trust
I feel safe in my transactions with the Web site.
I trust the Web site to keep my personal information safe.
I trust the Web site administrators will not misuse my personal information.
Response Time
When I use the Web site there is very little waiting time between my actions and the Web site’s
response.
The Web site loads quickly.
The Web site takes long to load.
EASE OF USE:
Ease of Understanding
The display pages within the Web site are easy to read.
The text on the Web site is easy to read.
The Web site labels are easy to understand.
Intuitive Operations
Learning to operate the Web site is easy for me.
It would be easy for me to become skillful at using the Web site.
I find the Web site easy to use.
9
ENTERTAINMENT:
Visual Appeal
The Web site is visually pleasing.
The Web site displays visually pleasing design.
The Web site is visually appealing.
Innovativeness
The Web site is innovative.
The Web site design is innovative.
The Web site is creative.
Flow—Emotional Appeal
I feel happy when I use the Web site.
I feel cheerful when I use the Web site.
I feel sociable when I use the Web site.
COMPLIMENTARY RELATIONSHIP:
Consistent Image
The Web site projects an image consistent with the company’s image.
The Web site fits with my image of the company.
The Web site’s image matches that of the company.
On-Line Completeness
The Web site allows transactions on-line.
All my business with the company can be completed via the Web site.
Most all business processes can be completed via the Web site.
Better than Alternative Channels
It is easier to use the Web site to complete my business with the company than it is to telephone,
fax, or mail a representative.
The Web site is easier to use then calling an organizational representative agent on the phone.
The Web site is an alternative to calling customer service or sales.
10
Table 2: Overall Fit of the Full WebQual Model
RMSEA
SRMR
RNI
NNFI
WebQual
Round 2
0.052
0.047
.90
.94
WebQual
Round 3
0.060
0.053
.92
.91
Recommended
Cutoff
< 0.06 to 0.08
< 0.06 to 0.08
> .90
> .90
RMSEA = Root Mean Square Error of Approximation, SRMR = Standardized Root Mean Square Residual, RNI =
Relative Noncentrality Index, NNFI = Non-normed Fit Index.
11
Table 1: Initial WebQual Dimensions
Higher
Level
Concept
Ease of Use
Usefulness
Dimension
Ease of Understanding
Intuitive Operation
Information Quality
Functional Fit-to-task
Interactivity
Trust
Response Time
Entertainment
Visual Appeal
Innovativeness
Flow
Complementary
Relationship
On-Line Completeness
Better than Alternative
Channels
Consistent Image
Customer
Service
Customer Service
Description
Easy to read and understand.
Easy to operate and navigate.
The information provided is accurate, current, and
relevant.
Meets task needs and improves performance.
Tailored communication between consumers and the
firm.
Secure communication and observance of information
privacy.
Time to get a response after a request or an interaction
with a site.
The aesthetics of a Web site.
The creativity and uniqueness of site design.
The emotional effect of using the Web site and intensity
of involvement
Allowing all or most necessary transactions to be
completed on-line (e.g., purchasing over the Web site)
Equivalent or better than other means of interacting with
the company.
The Web site image is compatible with the image
projected by the firm through other media
The response to customer inquiries, comments, and
feedback.
12
Download