A Digital Soapbox? The Information Value of Online Physician Ratings

advertisement
A Digital Soapbox?
The Information Value of Online Physician Ratings
Guodong (Gordon) Gao and Brad N Greenwood
Robert H. Smith School of Business
University of Maryland
Jeffrey McCullough
School of Public Health
University of Minnesota
Ritu Agarwal
Robert H. Smith School of Business
University of Maryland
September 2011
Working Paper
Please do not quote or cite without permission.
A Digital Soapbox?
The Information Value of Online Physician Ratings
Abstract
The internet has fostered widespread information transparency and a rapid rise in consumergenerated ratings. While these ratings, in the form of digital word-of-mouth, provide a
potentially useful source of information to guide consumer decision making, they may be subject
to bias. Notably controversial are online consumer ratings in healthcare services, where the
consequences of poor information are substantial. Drawing on the quality transparency literature
in healthcare and word-of-mouth, we examine the information value of online physician ratings.
We utilize a novel dataset that combines offline survey ratings of physician quality with
consumer-generated online ratings to investigate two potential biases: (1) selection of rated
physicians; and (2) selection of opinions expressed. We find that although ratings provided
online are correlated with offline population ratings, online patients are less likely to rate
physicians of low perceived quality. Additionally, the opinions expressed online, when
compared to the opinions of the offline population, tend to exaggerate physician ratings. The
existence of these biases limits the information value of online ratings. Our findings highlight
the general intrinsic biases associated with online ratings and contribute to the growing body of
research on user-generated content. Results also have implications for policy related to reducing
information asymmetry in healthcare services.
Key Words: online ratings, physician quality, online word-of-mouth, bias estimation
A Digital Soapbox?
The Information Value of On-Line Physician Ratings
1. Introduction
In the past decade, the emergence of Web 2.0 technologies has fostered rapid growth of
online ratings by consumers. While online reviews for consumer goods such as books and
movies are well-known, online ratings are increasingly expanding to professional services such
as car mechanics, lawyers, and physicians. Because consumers experience difficulty in
ascertaining the utility or impact of these goods, the transaction between trading partners is
typically characterized by significant information asymmetry and considerable risk for the buyer.
In such contexts, the availability of easy–to-access online consumer reviews should, in theory,
create welfare for consumers by providing them with important information to guide decisionmaking. It is therefore not surprising that so-called “expert” services, which possess substantial
experience and credence qualities, have been characterized as “natural candidates” for word-ofmouth (WOM) communication among consumers (Harrison-Walker 2001).
Within the professional services spectrum, healthcare is experiencing striking growth in
terms of online WOM, specifically, with regard to online ratings of physicians. Today, over
forty online destinations, including Angie’s List, healthgrades.com, RateMDs.com, Vitals.com,
Yelp.com, and Zagat, offer opportunities for patients to rate physicians online. Patients have
welcomed the growth of online physician ratings as a convenient source of information about
physician quality. Even though there is scholarly evidence that not all physicians are equal and
that quality is subject to significant variation (Gawande 2002), to date, very little information
about individual physician quality manages to make its way into the public domain. While much
1
effort has been expended on improving healthcare quality transparency in institutions such as
hospitals and nursing homes (Harris et al. 2008; Jha et al. 2005), less has been done on the level
of individual physicians. In the absence of other channels for discerning physician quality,
online physician ratings are gaining popularity among patients. The public demand for this
information is accelerating: a recent survey found that 61% of US adults have looked online for
health information, with 24% of those consulting online rankings or reviews of physicians or
other providers (Fox et al. 2009).
Despite the increasing popularity of online physician ratings among consumers, their
emergence has generated debate and controversy. Most physicians are apprehensive about being
rated online by patients (Levine 2009). In the United States, professional societies, such as the
American Medical Association (AMA) and some state governments, have expressed concern that
these ratings merely reflect unhappy patients’ opinions and may ruin physicians’ reputations, and
physicians have even resorted to lawsuits against these rating websites (Solomon 2008). When
the National Health Service (NHS) in the United Kingdom enabled a physician rating function
on an NHS-run website, it created significant disagreement and heated argument (Bacon 2009;
McCartney 2009) regarding the veracity, effectiveness and content of these reviews.
Disagreement among policymakers, consumers, and physicians about online physician
ratings centers on the question of their real value in informing patients about physician quality.
Although it may be a legitimate concern that such ratings are biased, and therefore might do
more harm than good in terms of educating consumers, there exists limited empirical
investigation, to date, of the information value of these ratings. This study seeks to fill this void,
and provides one of the first analyses of the information value of online physician ratings.
2
Based on an extensive literature survey, we first identify two biases that online ratings
might suffer from: selection of physicians to rate, and selection of opinions to express. We then
construct a novel dataset, which combines physician characteristics, online ratings, and local
demographic variables. Importantly, we match individual physician’s online ratings to an
extensive off-line patient survey that supports analyses comparing online reviews to offline
population opinions. Our data set allows us to assess the presence and magnitude of the two
selection biases while controlling for a robust set of confounding factors.
Our findings not only yield important implications for healthcare policy; they also
contribute to the online word-of-mouth research discourse. Despite the broad availability of
online reviews on almost every retail website, and growing evidence regarding the relationship
between reviews and outcomes, we have surprisingly little knowledge about the nature of these
reviews (Moe et al. 2010). This lack of understanding has resulted in a dearth of more
sophisticated metrics that associate reviews with sales, of websites that can better facilitate
knowledge sharing among consumers, and of systems to help firms manage online consumer
reviews (Chen et al. 2005). The insights generated from this study shed light on these important
issues as well.
The rest of this paper proceeds as follows. We begin with a brief overview of two
streams of literature that provide the foundation for the study: quality transparency in healthcare
services, and online WOM. Section 3 builds upon these bodies of work and develops the
research hypotheses. In Section 4 we describe the data and the empirical strategy, and present
results and robustness checks. Section 5 discusses the implications of the findings and concludes
the paper.
3
2. Background and Prior Literature
This study draws on two streams of prior work: quality transparency in healthcare, and
online word-of-mouth. Research in the former stream exists across the economics, healthcare
management and informatics, and information systems literatures, while the latter body of work
is primarily in the marketing and, more recently, in the information systems literature.
2.1 Quality Transparency in Healthcare
Among the range of expert services that consumers utilize, such as lawyers, plumbers,
and car mechanics, healthcare services is arguably a category where consumer choice may have
life and death ramifications. To the degree that physicians serve as patients’ agents, providing
expert knowledge and judgment, choosing a good physician can have wide-reaching clinical
implications. Because this decision is wholly up to the patient, transparent and accurate quality
measures are crucial in selecting a health care provider. Quality measures may also affect
healthcare providers by influencing their compensation and reputation. Challenges arise because
physician quality is difficult for consumers to measure, as outcomes are uncertain, search is
difficult, and consumers typically lack specialized clinical knowledge (Arrow 1963).
Policymakers in the United States have been proactive about addressing consumers’ need
for quality information on healthcare services. For over two decades they have exhorted
hospitals to become more transparent about their performance and to disclose quality information
to patients. In the ongoing efforts to inform consumers, the federal government, under the
auspices of the Centers for Medicare and Medicaid Services (CMS), the largest healthcare payer
in the US, initiated the HospitalCompare program. This program provides consumers with a
range of hospital performance metrics on a public website (http://www.hospitalcompare.hhs.gov).
A similar initiative for nursing homes supports comparative analysis of the quality of nursing
4
services1 and the Physician Quality Reporting System (PQRS) will eventually provide physician
ratings2.
Although quality information about hospitals and nursing homes is doubtless useful for
consumers, the utilization of these facilities is limited to less frequently occurring acute care
needs or chronic and end-of-life situations. Overall, consumption of healthcare services occurs
predominantly via consultations with a primary care physician. Yet, despite physicians’ central
role in health care delivery, consumers have limited access to systematic physician quality
information (Christianson et al. 2010; Harris et al. 2008; Jha et al. 2006). Traditionally,
consumers have relied on word-of-mouth references from friends and family when choosing
physicians.
In the absence of broadly available and easily accessible formal quality ratings of
physicians, the internet has provided a forum for consumers to fill this information vacuum. As
noted, online consumer ratings of physicians have rapidly grown. A recent study estimates that
the number of ratings has an annual growth rate of over 90% in the past several year (Gao et al.
2010). Nearly 60% of US adults have used online health information resources (Fox et al. 2009)
– a three-fold increase from 2001 (LeGrow et al. 2001). Furthermore, many consumers use
online information to select physicians (Fox et al. 2009).
The concerns about such ratings voiced by professional medical organizations and other
advocacy groups (Dolan 2008; Martin 2009) are predicated on the assumption that consumers
lack the detailed clinical knowledge and understanding needed in order to make accurate quality
assessments. Thus, results may represent a disgruntled minority rather than reflecting average
patient experiences. Additionally, because of the inherent information asymmetry in the
1
http://www.medicare.gov/NHCompare/
Although the CMS currently collects and analyzes PQRS data for participating Medicare provides, these data are
not yet made public at the physician level. See http://www.cms.gov/pqrs/ for further detail.
2
5
physician-patient encounter, consumer ratings may provide negligible or even misleading
information. This assertion has not, however, been verified empirically and theory does not
provide a clear answer to the question of information value. While recent studies find that online
consumer ratings are predominantly positive (Gao et al. 2010; Lagu et al. 2010), we have limited
understanding regarding the mechanisms through which online ratings occur, or the
representativeness of these ratings. Understanding the mechanisms underlying the rating process
is crucial to determining the meaning and value of online rating systems.
We build on this literature by combining online physician rating data with patient surveys
conducted offline. These data are used to understand not only which physicians receive online
ratings, but also the extent to which these ratings are representative of the average patient
population.
2.2 Online Word-of-Mouth
Online reviews of products and services are part of the broader phenomenon of online
word-of-mouth (WOM). WOM has a decades old tradition of being acknowledged as an
important determinant of consumer choice (Arndt 1967; Bass 1969; Butler 1923; Katz et al.
1955). Consumers often ask friends and colleagues to recommend restaurants when they dine in
unfamiliar areas and consult users of a product about their experience before buying a similar
item. Until about ten years ago, WOM was, by its very nature, generally limited to individuals
with close social connections (Brown et al. 1987) and/or close physical proximity. However, in
recent years, the internet has enabled online communities of consumers to exchange experiences
and opinions. As a result, the past decade has witnessed an increasing number of studies
examining the phenomenon of online consumer reviews (Clemons et al. 2006; Dellarocas et al.
2004; Dellarocas et al. 2010; Duan et al. 2008; Godes et al. 2004; Li et al. 2008; Moe et al. 2010;
6
Zhu et al. 2010). Although a complete review of this literature is beyond the scope of this paper,
we briefly summarize key findings and highlight gaps in understanding that our study seeks to
address.
Two conclusions can be drawn from a review of the literature on digital WOM. First, we
note that a majority of these studies focus on establishing the value proposition of digital WOM
by relating consumer reviews to financial measures. In this work, consumer reviews have been
examined from various perspectives, including the dispersion of WOM (Godes et al. 2004), the
distribution of the ratings (Clemons et al. 2006), product segment (Dellarocas et al. 2010), and
consumer characteristics (Zhu et al. 2010). As one example, a recent study by Chintagunta et al.
(2010) investigates the effects of the valence, volume, and variance of national online user
reviews on the box-office performance of feature films and find stronger effects for review
valence rather than volume. All these studies have provided important insights into the
relationship between online consumer reviews and sales, however, they have stopped short of
investigating the value of the information in these ratings, which is the focus of this study.
Second, the majority of these studies have been conducted for consumer products such as
TV shows (Godes et al. 2004), books (Chevalier et al. 2006), movies (Dellarocas et al. 2010;
Duan et al. 2008), games (Zhu et al. 2010) and beauty products (Moe et al. 2010). Notably
lacking are studies focused on consumer reviews for professional services, where there exists
even greater information asymmetry on provider quality. Although healthcare is the biggest
business sector of the US economy and accounts for 16% of the GDP, only a handful of studies
have begun to explore this nascent phenomenon of physician online ratings (Gao et al. 2010;
Lagu et al. 2010). The emphasis in this early work has been on understanding and describing an
emerging trend, and these studies tend to be descriptive in nature. No work we are aware of has
7
investigated the question of whether online reviews of physicians are subject to bias and
undertaken systematic research to provide quantitative estimates of the bias.
3. Theory and Hypotheses
3.1
The Nature of Bias in Online Reviews
A distinctive feature of online ratings is that they are voluntarily posted by users, i.e.,
they reflect intentional public disclosure of privately held information. The existence of a
variety of biases in such public reporting of consumption and trading experiences is
acknowledged and well-documented in the extant literature. Biases are typically a result of the
underlying social motivations and preferences of reviewers.
One bias arises as a result of the social context of online reviews. Reviewers who post
their product judgments do so in a context where others observe them and possibly use the
information to draw conclusions about the reviewer, thereby creating social pressure for the focal
reviewer. Studies indicating that perceived social pressure influences ratings beyond what the
consumer privately experiences include Moe and Trusov (2010) and Wang, Zhang, and Hann
(2010). Moe and Trusov argue that, via mechanisms related to social influence, the valence of
previously posted ratings affects future ratings behavior, and that these ratings dynamics, in turn,
influence sales. Wang, Zhang, and Hann (2010) posit that the social pressure experienced by
raters from their online “friends” affects ratings. After controlling for homophily effects, they
find that social influence is most potent when earlier ratings by friends are negative. Their study,
conducted in the context of books, further reveals that the strength of social influence is
conditioned by a variety of factors such as the popularity of the book, stage of the review cycle,
experience of the user, and the size of the user’s social network. Thus, to the degree that social
8
influence modifies the truthful reporting of experienced product quality, it reduces social welfare
by misleading users whose consumption choices are driven by the online reviews.
Researchers have also argued for and demonstrated the existence of temporal effects in
ratings. Early experimental work by Schlosser (2005) revealed that the presence of negative
reviews results in downward adjustments in product evaluations for future review contributors,
causing them to not reveal their true assessments. She attributes such behavior to rater
psychology whereby those holding a negative opinion are perceived to be more intelligent and
discriminating than those with a positive evaluation. Temporal effects on ratings can be
potentially problematic for consumers of the reviews, even when the veracity of the rating is not
compromised and it reflects the reviewer’s true experience. Arguing that the preferences of
those who purchase early are systematically different from late buyers and hence, are a source of
bias, Li and Hitt (2008) show that early consumer reviews of books on Amazon.com are
overwhelmingly positive, and that subsequent purchases do not make adjustments for this
positive bias when drawing inferences about product quality. Self-selection into early purchase
caused by ex ante preference differences has implications for future demand.
Hu, Pavlou, and Zhang (2006) isolate two biases in online ratings that together result in a
J-shaped distribution for product reviews, where an overwhelming majority of ratings are at the
most positive end of the scale (5 stars), a small number at the low end (1 star), and very few in
between the two extremes. Noting that the shape of this distribution is inconsistent with the
expectation of a normal curve with a large number of reviews, they characterize the selfselection biases that lead to the J-shaped distribution as a purchasing bias and an under-reporting
bias. The former bias, echoing the notion of heterogeneity in ex ante consumer preferences for
products identified by Li and Hitt (2008), arises because product purchases are made by those
9
with higher product valuations, thereby inflating the average rating for a product. The underreporting bias results from a higher propensity to rate a product amongst consumers that
experience extreme satisfaction or dissatisfaction, as compared with those who like or dislike the
product only moderately. Hu et al. (2009) conclude that the average rating is not an accurate
reflection of true product quality and recommend that consumers examine the entire rating
distribution.
To summarize, multiple underlying causes have been implicated in the bias observed in
online reviews. Causes of bias include systematic product preference differences among those
who elect to provide an online review and those who do not, the social context within which
reviews occur, such as the size of an individual’s friendship network, and the valence and
volume of prior reviews. To an extent, the presence of such bias is inevitable; it is important,
however, for researchers and policymakers to understand and circumvent the information loss
caused by the bias. The predominant focus in prior work has been on self-selection into
providing a review, rather than systematic differences related to the object of the review.
We address this gap by investigating two biases in the context of physician ratings. The
first, a selection bias, is related to consumers’ choice of what physicians to rate online. Drawing
on prior findings and related theory from marketing and psychology, we offer two alternative
predictions with distinct underlying theoretical mechanisms for which physicians receive ratings
online: a “sound of silence” explanation versus a “naming and shaming” explanation. The
second bias we explore is in the choice of what opinions to express online. Labeling this bias the
“hyperbole effect,” we predict that online reviews of doctors are less informative at the two ends
of the population opinions. Specific hypotheses related to these biases are developed below.
10
3.2
“Naming and shaming” vs. “Sound of silence” in physician selection
Consumers volitionally choose to go online and post reviews of their physicians. In the
absence of distortions in user choices about which physicians to rate online, those who are rated
online should be a representative sample of the underlying population. Yet, prior research in
marketing has extensively documented the phenomenon of customer complaining behavior
(CCB) in both online and offline settings (Bearden et al. 1979; Cho 2002; Cho et al. 2001).
Singh et al. (1988) define CCB as “a set of multiple behavioral and non-behavioral responses,
some or all of which are triggered by perceived dissatisfaction with a purchase episode.” Prior
research has also suggested that consumers’ propensity to complain is positively associated with
their level of dissatisfaction, the importance of the “purchase,” the relative benefit-cost ratio of
complaining, and their personal competence (Bearden et al. 1979; Cho 2002; Cho et al. 2001;
Landon 1977). Early work by Richins (1983) established that minor dissatisfaction evokes
minimal response from consumers, while more severe dissatisfaction was strongly correlated
with the consumer response of “telling others.”
To the extent that physicians who are lower in quality (as judged by patients) are more
likely to evoke dissatisfaction, we can expect more CCB for this group of physicians. Indeed,
critics worry that internet rating sites for physicians will become forums for disgruntled patients
to vent their anger (McCartney 2009; Miller 2007). If this “naming and shaming” assumption is
correct, then we would expect to see a disproportionately higher number of low quality
physicians being rated online. In this instance, the information value of the ratings for
consumers is considerably degraded: even if a physician is rated high compared to others,
consumers still need to exercise extra caution when making inferences about the quality of that
physician, because s/he is the “best of a bad lot.” Thus, we predict:
11
H1a: Online ratings of physicians are subject to the phenomenon of “naming and
shaming” in that physicians of lower quality are more likely to be rated online.
Drawing on findings related to CCB, we have argued that dissatisfied patients are more
likely to complain, yet research also exists to suggest that consumers are much less likely to
speak out about a negative experience (Dellarocas et al. 2008). Therefore, an alternative
possibility exists that online ratings yield a “sound of silence” effect, in which low quality
physicians receive fewer ratings because individuals prefer to keep their damaging evaluations
private. In their study of eBay’s feedback mechanisms Dellarocas and Wood (2008) characterize
a reporting bias in which the propensity to reveal private opinion is a function of the outcome
experienced by the reviewer. When the outcome is negative, traders choose not to provide low
ratings for fear of retaliation from trading partners. Their empirical results show that eBay
traders are significantly more likely to report satisfactory outcomes than those that are
moderately unsatisfactory.
The limited presence of negative WOM is further corroborated by a number of other
studies. Chevalier and Mayzlin (2006) find that consumer reviews of books are strikingly
positive at Amazon.com and Barnedandnoble.com. East et al. (2007) review 15 studies and
report that positive WOM is more pervasive than negative WOM in every case, with positive
ratings/reviews exceeding negative ones by a factor of 3. Hu, Pavlou, and Zhang’s (2006)
identification of the J-shaped distribution of product ratings is, similarly, an outcome of a
disproportionately large number of positive valuations.
A patient could be motivated to remain silent after a dissatisfying experience with a
physician for a variety of reasons. It is well recognized that the patient-physician relationship is
characterized by substantial information asymmetry where the physician, by virtue of specialized
12
knowledge and training, is the “expert,” and the patient has limited understanding of medical
procedures, diagnoses, and treatments. Moreover, society accords physicians a unique status
because of the nature of their work: saving lives. In such situations, the patient experiences
lower power and competence and may attribute the negative experience to their own
shortcomings. When such an attribution of blame to self occurs, the patient is less likely to
report the negative experience.
Moreover, unlike the consumption of goods such as movies or books where there is
typically no personal relationship between sellers and buyers, the patient-physician relationship
is generally deeply personal and enduring. Insurance requirements and other logistical barriers
create significant switching costs to move from one physician to another. In the patient’s mind,
public reporting of a negative experience may evoke the fear of physician retaliation, much as in
the case of trading partners in an online auction, albeit in a different form. Physicians’ reprisals
could take the form of, at best, interminable delays in getting the next appointment, or, at worst,
litigation for public libel. Indeed, in the highly litigious profession of medicine, the latter form
of reprisal is not implausible. Under the assumption of this form of behavior where patients
experiencing negative physician encounters volitionally remain silent, we would expect high
quality physicians to be more likely to be rated online. Following this logic we predict:
H1b: Online ratings of physicians are subject to the “sound of silence” phenomenon in
that physicians of lower quality are less likely to be rated online.
We note that H1a and H1b indicate contrasting predictions, yet it is possible that both
biases co-exist. Prior literature or theory has not identified which bias is more dominant. We
provide empirical insights into whether the “sound of silence” effect or the “naming and shaming”
effect prevails in online physician ratings.
13
3.3
Hyperbole Effects
The second bias arises from the valence of the rating itself. Substantial prior research has
documented that individuals are more likely to talk about extreme experiences (Anderson 1998;
Dellarocas et al. 2006; Hu et al. 2009). Anderson (1998) argued that the marginal utility a
consumer derives from WOM activity increases as satisfaction or dissatisfaction with the product
increases, resulting in a U-shaped curve for the valence of WOM activity. Empirical data from a
cross-cultural study of customer satisfaction in the United States and Sweden supports the
posited functional form (Anderson 1998).
Theoretically, the sharing of positive experiences has been ascribed to a variety of
motivations, including altruism (providing useful information for the benefit of others),
instrumental motives (the desire to appear well informed), and general cognitive biases that favor
positive incidents (Arndt 1967; Dichter 1966; Holmes et al. 1977). Motivational mechanisms
have also been implicated in high levels of dissatisfaction giving rise to negative WOM: Hu,
Pavlou, and Zhang (2009) label this “moaning” about the experience. Consumers who are highly
dissatisfied may desire to express their hostility (Jung 1959), or seek vengeance and retribution
(Richins 1983). Cognitively, consumers who are highly satisfied or highly dissatisfied are
argued to derive homeostase utility from communicating their experiences: “expressing positive
emotions and venting negative feelings” (Henning 2004). Homestase utility is predicated on the
fundamental tenets of balance theory or homeostasis, suggesting that individuals strive for
balance and will seek to restore equilibrium after an “unbalancing experience.” Highly positive
and highly negative (i.e., extreme) consumption experiences threaten the individual’s sense of
balance and motivate them to externalize their feelings by expressing opinions about the
consumption experience, which will then restore balance. We label this the “hyperbole” effect in
14
online ratings. Since physicians at the ends of the quality spectrum are more likely to evoke
extreme opinions or hyperbole, their ratings should contain less information value in reflecting
the population opinion. We predict:
H2: Online physician ratings exhibit the “hyperbole effect,” such that the online ratings
for high-end and/or low-end physicians are less informative in reflecting population opinions
than the ratings for average physicians.
4. Methodology and Results
4.1
Data
We construct the data set for empirical testing of the research hypotheses by matching
observations across four different datasets. The first dataset is an offline patient survey
conducted by the consumer advocacy group Consumers’ Checkbook, in collaboration with five
major health plans (Aetna, UnitedHealthcare, CIGNA HealthCare, BlueCross and BlueShield of
Kansas City, and BlueCross BlueShield of Tennessee). The questionnaire and survey procedure
were developed and tested by the U.S. Agency for Healthcare Research and Quality and
endorsed by the National Quality Forum, to ensure the representativeness of patient sample for
each physician. Patients rate physicians on a scale of 1- 100. The survey was conducted in three
major metropolitan areas within the United States (Denver, Memphis, and Kansas City). As with
most surveys, the representativeness of the offline patient data could be a matter of concern;
however, the rigor with which the Consumers’ Checkbook survey was conducted serves to
mitigate this issue. After pooling health plan claims data for physicians working with the major
health plans engaged in this effort, surveys were sent, on average, to 145 patients of each
physician (minimum 113). Roughly 50 surveys per physician were returned (a response rate
15
ranging from 32% in Denver to 42% in Kansas City) yielding coverage of 99.14% of the
physicians investigated.
A second dataset (the online ratings of physicians) is drawn from RateMDs.com, one of
the nation’s largest aggregators of consumers’ online physician ratings. Established in 2004,
RateMDs.com has accumulated over 1 million doctor ratings. According to a recent comparison
of physician rating website, RateMDs.com has the most ratings with patient narratives among
major players (Lagu et al 2010). The third dataset is the 2007 Economic Census (conducted by
the US Census Bureau), from which we gather information regarding the population and
economic conditions of the areas in which the physicians practice. The Economic Census
collects quinquennial data at the county level relating to population income, poverty,
demographics, and the emergence of new firms along with economic indicators regarding the
general fiscal health of the area. It is considered the definitive source for information related to
geographic variation in economic conditions. The final data source comes from the state medical
board websites that are responsible for physician accreditation, licensing, and disciplinary
oversight. Medical boards collect comprehensive information on doctors that practice in the
respective states and we use their data for e physician characteristics such as board certification,
physician experience, and specialty.
Our sample is constructed to reduce sample heterogeneity and control for potential
unobserved variable bias. Because consumer choice related to specialist physicians for particular
procedures (such as brain surgery) occurs less frequently, we restrict our sample to general
practitioners and family care physicians, whose services are used by all types of patients. The
resulting dataset is composed of 1,425 physicians with completed offline surveys, among which
16
794 physicians who have been rated online. We next introduce our specific variables of interest
and discuss their operationalization. Table 1 provides a summary of the variables.
Our empirical strategy uses two different dependent variables. To determine the
propensity to receive an online rating we use a dichotomous measure equal to one if a physician
is Rated Online at least once and zero otherwise. This variable does not, however, directly
measure the information value of the online ratings. To examine information value, we use the
average Online Rating conditional upon being rated online at least once. Although the results are
not reported, we also examined the number of online ratings (Rating Count) to test the robustness
of our conclusions. The independent variables of interest comprise patient survey results
administered by Checkbook. Our primary measure is the physician’s average survey score,
Survey Rating, based on all patients who participated in the offline survey. We also control for
the number of surveys completed, Surveys Filled.
The online and population ratings are subjective evaluations of quality from the patient’s
perspective. As discussed earlier, there is considerable controversy and ambiguity surrounding
the measurement of physician clinical quality and the appropriate objective metrics for quality
assessment. Lacking appropriate clinical training and expertise required to evaluate the true
quality of medical care, it may be the case that consumers’ opinions simply reflect the quality of
factors such as inter-personal interaction (e.g. bedside manner, communication skills, and
punctuality). These assessments constitute useful information for future consumers, and
evidence suggests that they guide consumers’ choice processes (Fanjiang et al. 2007).
Additionally, the patient perspective is being increasingly used as a measure of the quality of
care, as reflected in the HCAHPS initiative endorsed by the National Quality Forum3.
Commencing Oct 2012, patient satisfaction will also impact Medicare reimbursement (Rangel
3
http://www.hcahpsonline.org/home.aspx
17
2010). Finally, patients’ satisfaction with the relationship between them and their healthcare
provider has been argued to have positive effects on clinical outcomes s (Kane et al. 1997).
In addition to these variables, we control for a number of physician specific
characteristics that may influence both the propensity for the physician to receive an online
rating as well as the score received. These include dichotomous indicators physician Gender
(equal to one if male and zero otherwise) and physician certification by the state medical board
(Board). We control for the number of years the physician has practiced medicine, Experience,
which is the number of years elapsed from medical school graduation to the current year. We
control for the number of times the physician received endorsements (Peer Rating) from
physicians in another Checkbook survey.
Because physician ratings may be influenced by broader market dynamics, we also
control for a series of economic factors. These market variables are operationalized at the county
level. First among these is the Population of the county in which the physician practices.
Moreover, to account for both the geographic and digital competition the physician is facing, we
control for the number of physicians who have received Survey Ratings in the focal physician’s
ZIP code, Physician Count, as well as the number of physicians in the focal physician’s ZIP code
who have received an Online Rating, Rated Physician Count. As a physician’s patients may also
be subject to economic constraints we also control for the Median Income of the populace in the
county in which the physician practices. Finally, we control for the level of urbanization in the
city in which the physician practices (Urban and Large Urban indicating municipalities with
more than three or ten ZIP codes, respectively), as well as dummy variables indicating the
metropolitan area (Memphis and Denver; Kansas City serves as the excluded category).
18
Summary statistics for all variables are available in Table 1 and a correlation matrix is presented
in Table 2.
4.2 Analysis
Likelihood of Online Rating
Despite the recent rapid growth in online consumer ratings of physicians, a substantial
portion of doctors have yet to receive a consumer generated online rating. To the extent that this
variation may be driven by factors uncorrelated with physician quality, the absence of a rating
would provide no information regarding physician quality. Alternatively, selection into online
ratings could reflect the perceived quality of physicians. Hypotheses 1a and 1b offer alternative
explanations for online rating selection, where a physician’s likelihood of being rated online is
affected by her quality.
We first explore our data visually. Figure 1 describes the Survey Rating distributions for
physicians with and without online ratings. On average, Survey Rating for physicians with at
least one online rating are higher than Survey Rating for other physicians. These descriptive
patterns are inconsistent with Hypothesis 1a (naming and shaming), but provide some support for
Hypothesis 1b (sound of silence).
Based on the initial evidence from Figure 1, we formally test Hypotheses 1a and 1b by
modeling the probability that a physician receives an online rating as:
.
(1)
Where y is an indicator equal to one if a physician is rated and equal to zero otherwise. The
variable s1 is the physician’s Survey Rating and s2 is the number of surveys filled, while X and M
are vectors of physician and market characteristics, respectively. The terms {α1,β1,γ1,δ1,θ1} are
parameters to be estimated and ν represents an error term. The specification in (1) is estimated
19
using logistic regression. The first column in Table 3 reports parameter estimates for Equation 1.
We find that the coefficient of Survey Rating is 0.0135, and significant at p<0.10 level. This
suggests that higher Survey Ratings increase the probability of receiving an online rating,
therefore supporting the “sound of silence” effect.
In our baseline specification, we assume a linear relationship between Survey Rating and
the likelihood of being rated online. We next relax this restriction by allowing the coefficient of
Survey Rating to vary across its own distribution. The second specification in Table 4 builds on
Equation 1 to allow probability of receiving an online rating to vary across the lower, middle (the
excluded category), and upper terciles of Survey Rating. The third model allows the relationship
to vary across quartiles of Survey Rating (where the third quartile is the default group).
Interestingly, we see a significant negative correlation between the lowest segment indicator and
the probability of being rated online. The lower tercile coefficient is -0.342 (p<0.05), and the
lower quartile coefficient is -0.408 (p<0.05). Evaluated at the average values of other variables,
a physician in the bottom quartile has 10.1 percentage point reduction in the probability of being
rated online compared to an average physician. Moreover, we do not detect a significant
difference in the probability of being rated online in other ranges of Survey Rating. In other
words, high quality physicians are equally likely as average physicians to be rated online. This
provides strong evidence for rejecting Hypothesis 1a and supports the dominance of the “sound
of silence” effect.
Value of Online Ratings
Next, we examine how well the online rating reflects the underlying quality perceived by
the broader patient population, conditional on the physician being rated online. We first conduct
20
an overall assessment of the correlation between the two types of ratings, and then formally test
whether we observe hyperbole effects toward extremes of the quality spectrum.
Figure 2 presented the distribution of Online Rating for physicians with relatively low,
medium, and high Survey Rating. A number of interesting observations can be made from these
data. Overall, we find that physicians with higher Survey Rating receive higher Online Rating.
For example, physicians in the highest tercile of Survey Rating overwhelmingly receive an
Online Rating of 5. On the other hand, high ratings are less prevalent for physicians in the
middle and lowest tercile. This provides evidence that Online Rating are positively correlated
with Survey Rating and that the former do provide useful information for consumers.
We test this hypothesis using a variety of regression techniques. We model the value of
Online Rating (R) as:
.
(2)
We estimate simple, linear, specifications of s1, reported in columns 1 through 3 in Table 4.
Column 1 of Table 4 reports the OLS estimation of Equation (2). We detect a strong and
significant correlation between the Survey Rating and Online Rating (p<0.001). A 10-point
increase in the Survey Rating (measured on a 100 point scale) leads to 0.469 point increase in the
Online Rating (which is measured on a 5 point scale). As the range of Online Rating is truncated
and the distribution is skewed we employ alternative estimation techniques. Column 2 reports
the results from the Tobit specification. As expected, the magnitude is even greater in the Tobit
estimate (0.0732, p<0.01), indicating that Online Rating are informative and reflect population
opinions. The ordered logit yields almost the same finding as that of the Tobit model, both in
magnitude and statistical significance. This not only confirms the positive relationship, but also
21
suggests that it is reasonable to use the value of the ratings for further analysis; however, these
specifications lack the flexibility needed to test for bias or exaggeration in Online Rating.
Hypothesis 2 posits that the online ratings should contain less information value in
discriminating among individual physicians based on quality for those with particularly high or
low Survey Ratings. To test this hypothesis, we need to examine the relationship between online
and offline ratings in a disaggregated fashion, allowing for variation in this relationship across
the population opinion. Thus, we incorporated splines of Survey Rating, denoted by ;
,
in a series of regressions. These regressions explore heterogeneity in the relationship between
Online Rating and Survey Rating across the distribution of Survey Rating. Splines create a
piecewise specification where the functions are spliced together at pre-defined intervals based on
the distribution of Survey Rating values (Robb 1980; Suits et al. 1978). Linear splines impose a
linear specification on parameter values between knots while cubic splines allow for more
flexible relationships.4 These models create local flexibility without creating discontinuities
(Kennedy 2003).
;
.
(3)
Results are presented in Table 5. Columns 1 through 3 include linear splines of Survey Rating
and Columns 4 through 6 use cubic splines. Specification 1 of Table 6 is estimated using OLS
and generates separate parameter estimates for three segments of physicians’ Survey Rating: the
lower quartile, the middle half, and the top quartile. The Survey Rating coefficient is 0.0397 for
the lower quartile, and significant at p<0.01 level. This means that a 10-point increase in Survey
Rating leads to 0.397 point increase in the Online Rating. Interestingly, the Survey Rating
coefficient increases to 0.606 (p<0.01) in the middle half segment. This is 52% higher than the
4
Knots are set at 75 and 86 for linear spline Models. Cubic spline model knots were set at 66 and 79. Cubic spline
knots are based on the distribution of Survey Rating following Harrell (2001, 23).
22
lower quartile Survey Rating coefficient, implying that Online Rating is sensitive to Survey
Rating near the middle of the distribution. We do not, however, find a significant correlation
between Survey Rating and Online Rating in the highest quartile of Survey Ratings where the
coefficient is relatively small (β1 = 0.0167, p = 0.39), suggesting that online ratings are not
sensitive to survey ratings at the high end of the distribution.
Identifying the information value of Online Rating depends on jointly examining the
intercept, slopes, and confidence intervals. Consequently, we plot the predicted values of Online
Rating for ease of interpretation. In an ideal scenario, Online Rating would accurately reflect
Survey Rating; thus, it is useful to compare the predicted values with a theoretical scenario with a
linear mapping between the two ratings, which would have an intercept of 1 and a slope of 0.04.
Predicted values and confidence intervals from Column 5, Table 5 are plotted in Figure 3, along
with the theoretical perfect information line.
Figure 3 demonstrates that there is a positive correlation between Online Rating and
Survey Rating at the low end. The slope doubles between the first and second knots and is
significantly different from the perfect information line. The slope then falls again in the final
segment, with the predicted values remaining significantly higher than the perfect information
line. Note that the prediction is based on a Tobit model. Because the observed online ratings
reach a ceiling at the high end boundary of 5 out of 5, users will not be able to discern
differences among physicians with predicted values exceeding 5.
Overall, we find that Online Rating are relatively effective at distinguishing physicians
with average quality. However, Online Rating have less discerning power in separating low end
physicians, and almost no ability to distinguish quality variation within the top quartile of Survey
23
Rating (where most online ratings are capped at 5 out of 5). With these findings, we find support
for H2 and conclude that online ratings of physicians are subject to the “hyperbole” effect.
4.3 Robustness
The conclusions drawn in Section 4.2 are subject to a number of assumptions. The
relationship between Online Rating and Survey Rating is plausibly affected by several factors;
for example, continuous and Likert scales usually yield different distributions. We therefore
performed a wide range of specification tests to explore this issue. First, we employed several
different estimation strategies as demonstrated in Table 4. Second, we estimated alternative
spline functions to those reported in Table 5, choosing different numbers of knots and changing
knot locations. Third, we used alternative specifications of f in Equation 3, including polynomial
and categorical Survey Rating measures. Every specification supported the two crucial features
of Figure 3: 1) the predicted line is monotonically increasing in Survey Rating; and 2) Online
Rating is exaggerated for the upper tail of Survey Rating.
One remaining concern is that the Online Rating equations suffer from selection bias
based on individual consumers’ experiences. While Survey Rating may be an unbiased measure
of average perceived quality, a consumer’s rating is undoubtedly sensitive to their individual
experiences. This potential form of selection would be a concern if we were attempting to
measure the causal effect of Survey Rating on consumer experiences5. Our objective, however,
is to measure if, and when, Online Rating accurately reflects Survey Rating, irrespective of the
mechanism driving such variation.
5
One approach is to employ a Heckman selection model to jointly estimate selection into being rated online and the
value of Online Ratings. Potential instruments include geographic variation in the propensity for receiving an
Online Rating. However, this approach should be applied with caution given that the instruments are weak and the
data do not fit the distributional assumptions of the selection correction model.
24
5. Discussion and Conclusion
In this paper, we examined the information value of online physician ratings. Based on a
review of related literature, we identified two potential sources of bias inherent in online ratings:
the selection of physicians to rate, and the selection of opinions to express. We then derived
formal hypotheses related to these two biases, which were subsequently tested based on a unique
dataset of 1,425 primary care physicians in three metropolitan areas: Denver, Kansas City, and
Memphis. This study represents one of the first efforts in quantifying the information value of
online ratings.
We have three notable and interesting findings. First, physicians with low survey ratings
are less likely to be rated online, thereby supporting the predicted existence of the “sound of
silence” effect. Second, we find there is a strong correlation between the online ratings and
offline population opinion. Third, the association is strongest in the middle of physician quality
spectrum, and less strong in the high end. These results suggest that online ratings are more
informative of differences between physicians in average and low quality segments, but online
ratings are less effective in differentiating quality among high quality physicians. Given that
patients are in greater need of avoiding bad physicians than they are of finding the very best
physicians, these findings are encouraging. They suggest that the information provided by
online ratings has substantial social welfare implications.
The evidence presented in this study - that online opinions of products do indeed reflect
population opinions - constitutes an important contribution to the body of work on online
reviews. Although prior work has alluded to the presence of a variety of selection biases in
online ratings and digital WOM (Hu et al. 2006; Li et al. 2008), none of these studies have
explored a possible correlation between the opinions of online raters and offline consumers.
25
While this finding may appear intuitive, the nature of the self-selection issues that have been
discussed and empirically demonstrated in the realm of online ratings is sufficient to make
researchers question this ex ante assumption.
Theoretically, this study contributes to extant understanding of the information value of
digital WOM by proposing two new hypotheses related to the nature of bias in online reviews.
We find evidence of a dominant role for the “sound of silence” effect. While the “sound of
silence” has been demonstrated in other contexts, most notably in investigations of the online
retailer eBay (Dellarocas et al. 2008), its existence in environments where agents cannot utilize
direct retaliation strategies has yet to be confirmed. One area of fruitful future research
stemming from this study may be investigation into the factors that delineate groups exhibiting
the “sound of silence” from those which engage in “naming and shaming.”
A second theoretical advancement lies in the exploration of the hyperbole effect in the
context of online physician reviews. Although prior literature has shown hyperbole to be an
effect in online reviews, this study is the first to empirically demonstrate the effects of hyperbole
on the ability of consumers to delineate quality. We were able to identify the areas in the
distribution where consumers may glean the most and the least actual value and insight from the
differences in ratings. In this respect we have not only demonstrated the value of the online
rating but also identified where ratings provide limited information content in distinguishing
physicians from one another.
Additionally, this study contributes to the emerging health IT literature. Most studies
have focused on IT in hospital settings and investigated questions related to adoption or payoffs
(e.g., Devaraj and Kohli 2003, Goh, Gao, and Agarwal 2011, Kane and Labianca 2011). With
the infusion of health IT into the consumer domain, there is a great need to expand the realm of
26
health IT research (Agarwal et al 2010). One of the most dynamic and rapidly evolving areas is
how the Internet, especially the Web 2.0 tools, affects health. Studies have examined how patient
seek information on social networks (Yan and Tan 2010). We investigate the other side of the
question---how patients decide to share their assessment of physicians online, and how the
opinions are biased (Lee et al 2006).
Two major implications for policy and practice follow from our findings. First, as noted
previously, the AMA and other physician advocacy groups have been the staunchest opponents
of the online aggregation of physician ratings. While their concerns that patients lack the
training to evaluate physician quality remain a matter of speculation, as well as an avenue of
future investigation, the concern that ratings aggregation sites will become digital soapboxes for
disgruntled patients appears to be unfounded. Not only are the user ratings overwhelmingly
positive on these websites, but the reflection of offline population opinion in the online reviews
suggests that these reviews are informative.
The second implication for policy, which becomes apparent from this investigation, is the
pressing need for a national repository for online ratings. Much of the recent discourse in
healthcare transformation is focused around the notion of a patient-centric healthcare system
(Krist et al. 2011), in which the patient is viewed as an informed and engaged actor in managing
his/her healthcare and well-being. To the degree that such a vision demands transparency and
easily accessible information to drive patient decision making in consequential areas such as
physician choice, and because studies suggest that patients value the opinions of others more
than clinical quality measures (Fanjiang et al. 2007), repositories can facilitate important
comparisons and trade-offs. These data may be complementary to clinical quality measures
proposed for the PQRS.
27
Two limitations of this study represent fruitful opportunities for future work. First, our
findings are based on primary care physicians, and therefore one should exercise caution in
generalizing results to other services. Extending this study to different specialties in healthcare
services such as cardiologists and obstetricians would provide evidence on the robustness of our
findings. Extending the study to additional professional services would further illuminate the
existence and nature of biases in online ratings of this important category of services.
Second, this study investigates the bias associated with online ratings by comparing them
with those provided in offline surveys of patients. The offline survey was conducted using a
scientific approach with sufficient sample size and reasonable response rate, thereby constituting
a reasonable basis for identifying bias introduced by the online setting. Nevertheless, survey
opinions might be subject to the same selection biases discussed in Section 3. We note that if
this were indeed the case, the presence of offline bias would have tended to eliminate or at least
diminish difference between the online ratings and offline surveys. The fact that we observe
difference makes our findings more robust. Finally, patient surveys provide important insights
into physician quality and are increasingly emphasized in recent policy initiatives as important
indicators of physician performance. A natural extension of this work is to examine the
relationship between online ratings and physicians’ clinical quality, which we leave to future
research.
In conclusion, this paper provides one of the first empirical studies of the information
value of online ratings in reflecting the underlying population opinions. We advance knowledge
of the nature of the online ratings by quantifying the biases associated with the propensity of
being rated online, as well as the value of the rating received. If the growth and effects of online
reviews in other product categories are a good bellwether, we can only expect consumers to
28
increasingly turn to the internet for making choices related to professional services. And as
internet penetration increases, consumer generated ratings will likely continue to grow and
become an even more prominent information channel in critical sectors of our society such as
healthcare. This naturally raises the question of what type of caution consumers need to exercise
when making consumption choices based on online reviews. The findings from this study yield
important insights into how and when online ratings can be used to make physician selection
decisions that are more informed.
29
References
Agarwal, R., Gao, G., DesRoches, C., & Jha, a K. “Research commentary--The digital
transformation of healthcare: current status and the road ahead,” Information Systems
Research, 2010, 21(4), 796-809.
Anderson, E.W. "Customer Satisfaction and Word of Mouth," Journal of Service Research (1:1),
August 1, 1998 1998, pp 5-17.
Arndt, J. "Role of Product-Related Conversations in the Diffusion of a New Product," Journal of
Marketing Research (4:3) 1967, pp 291-295.
Arrow, K.J. "Uncertainty and the Welfare Economics of Medical Care," The American Economic
Review (53:5) 1963, pp 941-973.
Bacon, N. "Will doctor rating sites improve standards of care? Yes," BMJ (338), January 1, 2009
2009.
Bass, F. "A new product growth model for consumer durables," Management Science (15:5)
1969, p 13.
Bearden, W.O., and Mason, J. "Elderly Use of In-Store Information Sources and Dimensions of
Product Satisfaction/Dissatisfaction," Journal of Retailing (55:Winter) 1979, p 12.
Brown, J.J., and Reingen, P.H. "Social Ties and Word-of-Mouth Referral Behavior," The
Journal of Consumer Research (14:3) 1987, pp 350-362.
Butler, R. "Marketing and Merchandising," Alexander Hamilton Instite, New York, NY, pp. 2021.
Chen, Y., and Xie, J. "Third-Party Product Review and Firm Marketing Strategy," Marketing
Science (24:2) 2005, pp 218-240.
Chevalier, J.A., and Mayzlin, D. "The Effect of Word of Mouth on Sales: Online Book
Reviews," Journal of Marketing Research (JMR) (43:3) 2006, pp 345-354.
Chintagunta, P.K., Gopinath, S., and Venkataraman, S. "The Effects of Online User Reviews on
Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation
Across Local Markets," Marketing Science (29:5) 2010, pp 944-957.
Cho, Y. "An Analysis of Online Customer Complaints: Implications for Web Complaint
Management," 2002, pp. 176b-176b.
Cho, Y., Im, I., Hitlz, S., and Fjermestad, J. "Causes and Outcomes of Online Customer
Complaining Behavior: Implications for Customer Relationship Management (CRM),"
Americas Conference on Information Systems, 2001.
Christianson, J., Volmar, K., Alexander, J., and Scanlon, D. "A Report Card on Provider Report
Cards: Current Status of the Health Care Transparency Movement," Journal of General
Internal Medicine (25:11) 2010, pp 1235-1241.
Clemons, E.K., Gao, G.G., and Hitt, L.M. "When Online Reviews Meet Hyperdifferentiation: A
Study of the Craft Beer Industry," Journal of Management Information Systems (23:2),
Fall2006 2006, pp 149-171.
Dellarocas, C., Awad, N., and Zhang, X. "Exploring the Value of Online Reviews to
Organizations: Implications for Revenue Forecasting and Planning," International
Conference on Information Systems, 2004.
Dellarocas, C., Gao, G., and Narayan, R. "Are Consumers More Likely to Contribute Online
Reviews for Hit or Niche Products?," Journal of Management Information Systems (27:2),
Fall2010 2010, pp 127-157.
30
Dellarocas, C., and Narayan, R. "A Statistical Measure of a Population's Propensity to Engage in
Post-Purchase Online Word-of-Mouth," Statistical Science (21:2) 2006, pp 277-285.
Dellarocas, C., and Wood, C.A. "The Sound of Silence in Online Feedback: Estimating Trading
Risks in the Presence of Reporting Bias," Management Science (54:3), March 1, 2008
2008, pp 460-476.
Dichter, E. "How Word-of-Mouth Advertising Works," Harvard Business Review (44:6) 1966, p
147.
Dolan, P. "Patients Rarely Use Online Ratings to Pick Physicians," in: American Medical News,
American Medical Association, Washington DC, 2008.
Duan, W., Gu, B., and Whinston, A.B. "Do online reviews matter? -- An empirical investigation
of panel data," Decision Support Systems (45:4) 2008, pp 1007-1016.
East, R., Hammond, K., and Wright, M. "The relative incidence of positive and negative word of
mouth: A multi-category study," International Journal of Research in Marketing (24:2)
2007, pp 175-184.
Fanjiang, G., von Glahn, T., Chang, H., Rogers, W., and Safran, D. "Providing Patients Webbased Data to Inform Physician Choice: If You Build It, Will They Come?," Journal of
General Internal Medicine (22:10) 2007, pp 1463-1466.
Fox, S., and Jones, S. "The Social Life of Health Information," Pew Research Center.
Gao, G., McCullough, J., Agarwal, R., and Jha, A. "A Premier Study of Online Doctor Ratings,"
Workshop on Information Systems and Economics, St Louis, MO, 2010.
Gawande, A. Complications: A Surgeon’s Notes on an Imperfect Science Metropolitan Books,
New York, NY, 2002.
Godes, D., and Mayzlin, D. "Using Online Conversations to Study Word-of-Mouth
Communication," Marketing Science (23:4), Fall2004 2004, pp 545-560.
Harris, K., and Buntin, M. "Choosing a Health Care Provider: The Role of Quality Information,"
Policy (1:14) 2008.
Harrison-Walker, L.J. "The Measurement of Word-of-Mouth Communication and an
Investigation of Service Quality and Customer Commitment As Potential Antecedents,"
Journal of Service Research (4:1), August 1, 2001 2001, pp 60-75.
Henning, P. "Everyday Cognition and Situated Learning," in: Handbook of Researchon
Educational Communications, 2004.
Holmes, J., and Lett, J. "Product Sampling and Word of Mouth," Journal of Advertising
Research (17:5) 1977, p 6.
Hu, N., Pavlou, P.A., and Zhang, J. "Can online reviews reveal a product's true quality?:
empirical findings and analytical modeling of Online word-of-mouth communication," in:
Proceedings of the 7th ACM conference on Electronic commerce, ACM, Ann Arbor,
Michigan, USA, 2006, pp. 324-330.
Hu, N., Zhang, J., and Pavlou, P.A. "Overcoming the J-shaped distribution of product reviews,"
Commun. ACM (52:10) 2009, pp 144-147.
Jha, A.K., and Epstein, A.M. "The Predictive Accuracy Of The New York State Coronary Artery
Bypass Surgery Report-Card System," Health Affairs (25:3), May 1, 2006 2006, pp 844855.
Jha, N., Premarajan, K.C., Nagesh, S., Khanal, S., and Thapa, L.B. "Five-star Doctors for the
21st Century," Journal of Health Management (7:2), October 1, 2005 2005, pp 237-247.
Jung, C. "Concerning the archetypes, with special reference to the anima concept," Collected
Works) 1959.
31
Kane, R.L., Maciejewski, M., and Finch, M. "The Relationship of Patient Satisfaction with Care
and Clinical Outcomes," Medical Care (35:7) 1997, pp 714-730.
Katz, E., and Lazarsfeld, P. Personal Influence The Free Press of Glencoe, Glencoe, IL, 1955.
Kennedy, P. A Guide to Econometrics The MIT Press, Cambridge, MA, 2003.
Krist, A., and Woolf, S. "A vision for patient-centered health information systems," JAMA:305)
2011, p 2.
Lagu, T., Hannon, N., Rothberg, M., and Lindenauer, P. "Patients’ Evaluations of Health Care
Providers in the Era of Social Networking: An Analysis of Physician-Rating Websites,"
Journal of General Internal Medicine (25:9) 2010, pp 942-946.
Landon, L. "A Model of Consumer Complaint Behavior," in: Consumer Satisfaction,
Dissatisfaction and Complaining Behavior, 1977.
LeGrow, G., and Metzger, J. "E-Disease Management," California HealthCare Foundation.
Levine, J. "Welcoming Transparency: Some Physicians Find Merit in Online Ratings," The
Journal of Perinatal Education (18:3) 2009, p 7.
Li, X., and Hitt, L.M. "Self-Selection and Information Role of Online Product Reviews,"
Information Systems Research (19:4), December 1, 2008 2008, pp 456-474.
Martin, R. "To Doctors' Dismay, Web Ratings Grow.," in: St Petersburg Times, St Petersburg,
FL, 2009.
McCartney, M. "Will doctor rating sites improve the quality of care? No," BMJ (338), January 1,
2009 2009.
Miller, C. "What Would Hippocrates Do?," in: Forbes Magazine, 2007.
Moe, W., and Trusov, M. "The Value of Social Dynamics in Online Product Ratings Forums,"
Journal of Marketing Research) 2010.
Rangel, C. "Patient Protection and Affordable Care Act," in: 111-148, t.U.S. Congress (ed.),
2010.
Richins, M.L. "Negative Word-of-Mouth by Dissatisfied Consumers: A Pilot Study," The
Journal of Marketing (47:1) 1983, pp 68-78.
Robb, L. "Accounting for Seasonality with Spline Functions," The Review of Economics and
Statistics (62:2) 1980, p 3.
Schlosser, Ann E. "Posting versus Lurking: Communicating in a Multiple Audience Context,"
The Journal of Consumer Research (32:2) 2005, pp 260-265.
Singh, J. "Consumer Complaint Intentions and Behavior: Definitional and Taxonomical Issues,"
The Journal of Marketing (52:1) 1988, pp 93-107.
Solomon, S. "Doc seeks $12 million for RateMDs comments," National Review of Medicine)
2008.
Suits, D.B., Mason, A., and Chan, L. "Spline Functions Fitted by Standard Regression Methods,"
The Review of Economics and Statistics (60:1) 1978, p 8.
Wang, C., Zhang, X., and Hann, I.-H. "The Value of Social Dynamics in Online Product Ratings
Forums," in: Workshop on Information Systems Economics, St Louis, MO, 2010.
Zhu, F., and Zhang, X. "Impact of Online Consumer Reviews on Sales: The Moderating Role of
Product and Consumer Characteristics," Journal of Marketing (74:2) 2010, pp 133-148.
32
Figure 1. The Distributions of Survey Ratings for Physicians
Rated and Unrated Online
30%
25%
20%
Unrated
15%
Rated
10%
5%
0%
30 35 40 45 50 55 60 65 70 75 80 85 90 95
33
Figure 2. Distribution of Online Ratings by Tercile of Survey Rating
90%
Percent of physicians within tercile
80%
70%
60%
50%
Lowest
40%
Middle
High
30%
20%
10%
0%
1
2
3
4
5
Online Rating
34
Figure 3. Predicted Online Ratings as a Function of Survey Rating with
95% Confidence Intervals
Based on Tobit estimates with cubic splines (Column 5, Table 5).
35
Table 1: Summary Statistics
Variable
Mean
Std. Dev.
Online Ratings
Rated Online (yes=1)
0.557
0.497
Online Rating value
4.021
1.215
Rating Count
1.772
2.749
Survey Ratings
Survey Rating
79.782
8.199
Surveys Filled
55.232
15.828
Physician Characteristics
Board
0.799
0.401
Gender
0.681
0.466
Experience
23.968
9.292
Peer Rating
0.164
0.918
Market Characteristics
Population (in thousands)
518.123
273.477
Physician Count
15.588
13.790
Rated Physician Count
7.161
6.321
Urban
0.535
0.499
Large Urban
0.403
0.491
Median Income
47.212
10.912
Denver
0.295
0.456
Memphis
0.269
0.443
Min
Max
0
1
0
1
5
24
34
10
96
105
0
0
5
0
1
1
60
12
8.969
0
0
0
0
23.27
0
0
897.472
57
27
1
1
82.929
1
1
36
Table 2: Correlation Matrix
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Online Rating
Rating Count
Survey Rating
Rated Online
Surveys Filled
Board
Gender
Experience
Peer Rating
Population
Physician Count
Rated Physician Count
Urban
Large Urban
Median Income
8
9
10
11
12
13
14
15
Experience
Peer Rating
Population
Physician Count
Rated Physician Count
Urban
Large Urban
Median Income
1
2
3
4
5
6
7
-0.093
0.329
.
0.166
0.04
0.121
0.063
0.083
0.051
0.075
0.074
-0.018
-0.002
0.006
-0.07
0.569
0.009
-0.027
-0.069
-0.041
0.005
-0.003
-0.012
0.044
-0.005
0.013
0.164
0.072
0.437
0.078
0.119
0.049
0.144
0.199
0.227
0.236
0.074
0.135
-0.075
0.061
-0.001
-0.024
-0.028
0.043
0.033
0.067
0.071
0.037
0.004
0.094
0.051
0.129
0.217
0.216
0.05
0.155
0.119
-0.032
-0.053
-0.027
-0.082
-0.269
0.057
0.051
0.091
0.093
0.084
0.05
0.043
0.347
0.093
-0.027
0.011
0.014
0.005
-0.016
-0.054
8
9
10
11
12
13
14
0.059
0.083
0.074
0.044
0.036
0.062
-0.095
0.052
0.08
0.066
0.038
0.049
-0.011
0.532
0.462
0.364
0.419
-0.444
0.853
0.112
0.193
-0.248
0.235
0.276
-0.124
0.745
-0.357
-0.406
37
Table 3: Probability of a Physician Being Rated Online
Survey Rating
Lower Tercile
of Survey Rating
Upper Tercile
of Survey Rating
Lower Quartile
of Survey Rating
2nd Lowest Quartile
of Survey Rating
Upper Quartile
of Survey Rating
Denver
(1)
Base
Regression
0.0135*
(0.00772)
0.283*
(0.145)
Memphis
-0.0454
(0.177)
Surveys Filled
0.00604
(0.00423)
Urban
0.386**
(0.186)
Large Urban
-0.185
(0.184)
Physician Count
0.0178**
(0.00890)
Rated Physician Count
-0.0151
(0.0188)
Population
0.000184
(0.000279)
Median Income
0.0210***
(0.00673)
Experience
-0.0105
(0.00677)
Gender
-0.0110
(0.126)
Board
-0.223
(0.145)
Constant
-2.201***
(0.671)
LR Chi2
47.31
Pseudo R2
0.0243
N
1,425
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
(2)
Tercile
Regression
(3)
Quartile
Regression
-0.342**
(0.136)
-0.0234
(0.139)
0.291**
(0.146)
-0.0509
(0.178)
0.00567
(0.00413)
0.390**
(0.186)
-0.199
(0.184)
0.0176**
(0.00890)
-0.0151
(0.0188)
0.000201
(0.000279)
0.0211***
(0.00674)
-0.0109
(0.00678)
-0.00241
(0.126)
-0.220
(0.145)
-0.977**
(0.462)
60.08
0.0297
1,425
-0.408**
(0.160)
-0.136
(0.150)
0.00799
(0.165)
0.290**
(0.146)
-0.0594
(0.178)
0.00520
(0.00413)
0.398**
(0.187)
-0.213
(0.185)
0.0181**
(0.00891)
-0.0159
(0.0188)
0.000189
(0.000279)
0.0208***
(0.00674)
-0.0106
(0.00677)
-0.0144
(0.126)
-0.217
(0.146)
-0.918*
(0.473)
59.66
0.0295
1,425
38
Table 4: Reflection of Survey Ratings on Online Rating Values, Base Specifications
(1)
OLS
Survey Rating
Peer Rating
Rating Count
Denver
Memphis
Surveys Filled
Urban
Large Urban
Physician Count
Rated Physician Count
Population
Median Income
Experience
Gender
Board
Constant
N
Chi2
R2
0.0469***
(0.00595)
0.0349
(0.0413)
-0.0283**
(0.0139)
-0.0654
(0.109)
-0.103
(0.150)
-0.000392
(0.00335)
-0.0714
(0.134)
-0.0414
(0.133)
-3.71e-05
(0.00648)
0.00208
(0.0138)
0.000149
(0.000232)
.000425
(0.00497)
0.00432
(0.00520)
0.177*
(0.0943)
0.0724
(0.108)
-0.113
(0.509)
794
(2)
(3)
Ordered
Logit
0.0732*** 0.0744***
(0.0107)
(0.00985)
0.128
0.0802
(0.0875)
(0.0747)
-0.108*** -0.110***
(0.0244)
(0.0205)
-0.166
-0.144
(0.199)
(0.173)
-0.254
-0.139
(0.273)
(0.243)
0.000542
-0.00116
(0.00609) (0.00533)
-0.0213
0.0577
(0.247)
(0.216)
-0.103
-0.120
(0.244)
(0.213)
0.00335
0.00178
(0.0121)
(0.0106)
-0.00391 -0.000599
(0.0257)
(0.0224)
0.000377 0.000188
(0.000422) (0.000378)
0.00946
0.00761
(0.00904) (0.00783)
0.00909
0.00908
(0.00957) (0.00837)
0.300*
0.239
(0.170)
(0.148)
0.0880
0.0783
(0.196)
(0.172)
-1.964**
(0.922)
794
794
131.95
Tobit
0.125
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
39
Table 5: Reflection of Survey Rating on Online Rating Values, Allowing for Flexibility Across the
Survey Rating Distribution.
(1)
(2)
(3)
(4)
(5)
(6)
0.0397***
(0.0122)
0.0606***
(0.0130)
0.0167
(0.0272)
Spline
Spline
Spline Model Spline Model
Model
Model
(Ordered
(Ordered
(Tobit)
(OLS)
(Tobit)
Logit)
Logit)
0.0488** 0.0506***
(0.0214)
(0.0194)
0.0969*** 0.0993***
(0.0235)
(0.0210)
0.0663
0.0580
(0.0527)
(0.0454)
0.0417
(0.0419)
-0.0279**
(0.0139)
-0.0606
(0.109)
-0.0959
(0.150)
-0.000354
(0.00335)
-0.0634
(0.135)
-0.0485
(0.133)
0.000349
(0.00648)
0.00102
(0.0138)
0.000143
(0.000232)
0.00422
(0.00497)
0.00435
(0.00521)
0.178*
(0.0946)
0.0784
(0.109)
0.345
(0.910)
794
0.0423***
(0.0109)
0.0717***
(0.0194)
0.0176
(0.0208)
0.0424
(0.0417)
-0.0273*
(0.0139)
-0.0621
(0.109)
-0.0922
(0.150)
-0.000296
(0.00336)
-0.0591
(0.135)
-0.0499
(0.133)
0.000544
(0.00649)
0.000506
(0.0138)
0.000141
(0.000232)
0.00427
(0.00497)
0.00433
(0.00520)
0.177*
(0.0945)
0.0765
(0.109)
0.159
(0.835)
794
Spline Model Spline Model
(OLS)
Spline-Lower Quartile
of Survey Rating
Spline-Middle Half
of Survey Rating
Spline-Upper Quartile
of Survey Rating
Spline-Low end
of Survey Rating
Spline-Middle
of Survey Rating
Spline-High end
of Survey Rating
Peer Rating
Rating Count
Denver
Memphis
Surveys Filled
Urban
Large Urban
Physician Count
Rated Physician Count
Population
Median Income
Experience
Gender
Board
Constant
N
Chi2
R2
0.127
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
0.125
(0.0877)
-0.109***
(0.0244)
-0.146
(0.199)
-0.276
(0.274)
0.000362
(0.00609)
-0.0196
(0.247)
-0.111
(0.244)
0.00384
(0.0122)
-0.00568
(0.0257)
0.000383
(0.000422)
0.00901
(0.00904)
0.00876
(0.00957)
0.310*
(0.170)
0.0819
(0.196)
-0.259
(1.591)
794
0.0793
(0.0749)
-0.110***
(0.0205)
-0.124
(0.174)
-0.160
(0.244)
-0.00139
(0.00535)
0.0556
(0.216)
-0.122
(0.213)
0.00253
(0.0107)
-0.00265
(0.0225)
0.000197
(0.000378)
0.00718
(0.00783)
0.00892
(0.00839)
0.244*
(0.148)
0.0748
(0.172)
794
134.06
0.0532*** 0.0557***
(0.0173)
(0.0191)
0.117***
0.117***
(0.0311)
(0.0353)
0.0530
0.0537
(0.0343)
(0.0398)
0.0805
0.128
(0.0746)
(0.0872)
-0.108*** -0.109***
(0.0205)
(0.0244)
-0.127
-0.149
(0.174)
(0.199)
-0.158
-0.271
(0.244)
(0.274)
-0.00140
0.000338
(0.00609) (0.00535)
0.0624
-0.0139
(0.216)
(0.247)
-0.124
-0.112
(0.213)
(0.244)
0.00280
0.00405
(0.0107)
(0.0121)
-0.00340
-0.00623
(0.0225)
(0.0257)
0.000188
0.000376
(0.000422) (0.000378)
0.00700
0.00894
(0.00904) (0.00783)
0.00915
0.00894
(0.00956) (0.00839)
0.243
0.308*
(0.148)
(0.170)
0.0745
0.0823
(0.172)
(0.196)
-0.560
(1.461)
794
794
134.14
0.128
40
Download