A Digital Soapbox? The Information Value of Online Physician Ratings Guodong (Gordon) Gao and Brad N Greenwood Robert H. Smith School of Business University of Maryland Jeffrey McCullough School of Public Health University of Minnesota Ritu Agarwal Robert H. Smith School of Business University of Maryland September 2011 Working Paper Please do not quote or cite without permission. A Digital Soapbox? The Information Value of Online Physician Ratings Abstract The internet has fostered widespread information transparency and a rapid rise in consumergenerated ratings. While these ratings, in the form of digital word-of-mouth, provide a potentially useful source of information to guide consumer decision making, they may be subject to bias. Notably controversial are online consumer ratings in healthcare services, where the consequences of poor information are substantial. Drawing on the quality transparency literature in healthcare and word-of-mouth, we examine the information value of online physician ratings. We utilize a novel dataset that combines offline survey ratings of physician quality with consumer-generated online ratings to investigate two potential biases: (1) selection of rated physicians; and (2) selection of opinions expressed. We find that although ratings provided online are correlated with offline population ratings, online patients are less likely to rate physicians of low perceived quality. Additionally, the opinions expressed online, when compared to the opinions of the offline population, tend to exaggerate physician ratings. The existence of these biases limits the information value of online ratings. Our findings highlight the general intrinsic biases associated with online ratings and contribute to the growing body of research on user-generated content. Results also have implications for policy related to reducing information asymmetry in healthcare services. Key Words: online ratings, physician quality, online word-of-mouth, bias estimation A Digital Soapbox? The Information Value of On-Line Physician Ratings 1. Introduction In the past decade, the emergence of Web 2.0 technologies has fostered rapid growth of online ratings by consumers. While online reviews for consumer goods such as books and movies are well-known, online ratings are increasingly expanding to professional services such as car mechanics, lawyers, and physicians. Because consumers experience difficulty in ascertaining the utility or impact of these goods, the transaction between trading partners is typically characterized by significant information asymmetry and considerable risk for the buyer. In such contexts, the availability of easy–to-access online consumer reviews should, in theory, create welfare for consumers by providing them with important information to guide decisionmaking. It is therefore not surprising that so-called “expert” services, which possess substantial experience and credence qualities, have been characterized as “natural candidates” for word-ofmouth (WOM) communication among consumers (Harrison-Walker 2001). Within the professional services spectrum, healthcare is experiencing striking growth in terms of online WOM, specifically, with regard to online ratings of physicians. Today, over forty online destinations, including Angie’s List, healthgrades.com, RateMDs.com, Vitals.com, Yelp.com, and Zagat, offer opportunities for patients to rate physicians online. Patients have welcomed the growth of online physician ratings as a convenient source of information about physician quality. Even though there is scholarly evidence that not all physicians are equal and that quality is subject to significant variation (Gawande 2002), to date, very little information about individual physician quality manages to make its way into the public domain. While much 1 effort has been expended on improving healthcare quality transparency in institutions such as hospitals and nursing homes (Harris et al. 2008; Jha et al. 2005), less has been done on the level of individual physicians. In the absence of other channels for discerning physician quality, online physician ratings are gaining popularity among patients. The public demand for this information is accelerating: a recent survey found that 61% of US adults have looked online for health information, with 24% of those consulting online rankings or reviews of physicians or other providers (Fox et al. 2009). Despite the increasing popularity of online physician ratings among consumers, their emergence has generated debate and controversy. Most physicians are apprehensive about being rated online by patients (Levine 2009). In the United States, professional societies, such as the American Medical Association (AMA) and some state governments, have expressed concern that these ratings merely reflect unhappy patients’ opinions and may ruin physicians’ reputations, and physicians have even resorted to lawsuits against these rating websites (Solomon 2008). When the National Health Service (NHS) in the United Kingdom enabled a physician rating function on an NHS-run website, it created significant disagreement and heated argument (Bacon 2009; McCartney 2009) regarding the veracity, effectiveness and content of these reviews. Disagreement among policymakers, consumers, and physicians about online physician ratings centers on the question of their real value in informing patients about physician quality. Although it may be a legitimate concern that such ratings are biased, and therefore might do more harm than good in terms of educating consumers, there exists limited empirical investigation, to date, of the information value of these ratings. This study seeks to fill this void, and provides one of the first analyses of the information value of online physician ratings. 2 Based on an extensive literature survey, we first identify two biases that online ratings might suffer from: selection of physicians to rate, and selection of opinions to express. We then construct a novel dataset, which combines physician characteristics, online ratings, and local demographic variables. Importantly, we match individual physician’s online ratings to an extensive off-line patient survey that supports analyses comparing online reviews to offline population opinions. Our data set allows us to assess the presence and magnitude of the two selection biases while controlling for a robust set of confounding factors. Our findings not only yield important implications for healthcare policy; they also contribute to the online word-of-mouth research discourse. Despite the broad availability of online reviews on almost every retail website, and growing evidence regarding the relationship between reviews and outcomes, we have surprisingly little knowledge about the nature of these reviews (Moe et al. 2010). This lack of understanding has resulted in a dearth of more sophisticated metrics that associate reviews with sales, of websites that can better facilitate knowledge sharing among consumers, and of systems to help firms manage online consumer reviews (Chen et al. 2005). The insights generated from this study shed light on these important issues as well. The rest of this paper proceeds as follows. We begin with a brief overview of two streams of literature that provide the foundation for the study: quality transparency in healthcare services, and online WOM. Section 3 builds upon these bodies of work and develops the research hypotheses. In Section 4 we describe the data and the empirical strategy, and present results and robustness checks. Section 5 discusses the implications of the findings and concludes the paper. 3 2. Background and Prior Literature This study draws on two streams of prior work: quality transparency in healthcare, and online word-of-mouth. Research in the former stream exists across the economics, healthcare management and informatics, and information systems literatures, while the latter body of work is primarily in the marketing and, more recently, in the information systems literature. 2.1 Quality Transparency in Healthcare Among the range of expert services that consumers utilize, such as lawyers, plumbers, and car mechanics, healthcare services is arguably a category where consumer choice may have life and death ramifications. To the degree that physicians serve as patients’ agents, providing expert knowledge and judgment, choosing a good physician can have wide-reaching clinical implications. Because this decision is wholly up to the patient, transparent and accurate quality measures are crucial in selecting a health care provider. Quality measures may also affect healthcare providers by influencing their compensation and reputation. Challenges arise because physician quality is difficult for consumers to measure, as outcomes are uncertain, search is difficult, and consumers typically lack specialized clinical knowledge (Arrow 1963). Policymakers in the United States have been proactive about addressing consumers’ need for quality information on healthcare services. For over two decades they have exhorted hospitals to become more transparent about their performance and to disclose quality information to patients. In the ongoing efforts to inform consumers, the federal government, under the auspices of the Centers for Medicare and Medicaid Services (CMS), the largest healthcare payer in the US, initiated the HospitalCompare program. This program provides consumers with a range of hospital performance metrics on a public website (http://www.hospitalcompare.hhs.gov). A similar initiative for nursing homes supports comparative analysis of the quality of nursing 4 services1 and the Physician Quality Reporting System (PQRS) will eventually provide physician ratings2. Although quality information about hospitals and nursing homes is doubtless useful for consumers, the utilization of these facilities is limited to less frequently occurring acute care needs or chronic and end-of-life situations. Overall, consumption of healthcare services occurs predominantly via consultations with a primary care physician. Yet, despite physicians’ central role in health care delivery, consumers have limited access to systematic physician quality information (Christianson et al. 2010; Harris et al. 2008; Jha et al. 2006). Traditionally, consumers have relied on word-of-mouth references from friends and family when choosing physicians. In the absence of broadly available and easily accessible formal quality ratings of physicians, the internet has provided a forum for consumers to fill this information vacuum. As noted, online consumer ratings of physicians have rapidly grown. A recent study estimates that the number of ratings has an annual growth rate of over 90% in the past several year (Gao et al. 2010). Nearly 60% of US adults have used online health information resources (Fox et al. 2009) – a three-fold increase from 2001 (LeGrow et al. 2001). Furthermore, many consumers use online information to select physicians (Fox et al. 2009). The concerns about such ratings voiced by professional medical organizations and other advocacy groups (Dolan 2008; Martin 2009) are predicated on the assumption that consumers lack the detailed clinical knowledge and understanding needed in order to make accurate quality assessments. Thus, results may represent a disgruntled minority rather than reflecting average patient experiences. Additionally, because of the inherent information asymmetry in the 1 http://www.medicare.gov/NHCompare/ Although the CMS currently collects and analyzes PQRS data for participating Medicare provides, these data are not yet made public at the physician level. See http://www.cms.gov/pqrs/ for further detail. 2 5 physician-patient encounter, consumer ratings may provide negligible or even misleading information. This assertion has not, however, been verified empirically and theory does not provide a clear answer to the question of information value. While recent studies find that online consumer ratings are predominantly positive (Gao et al. 2010; Lagu et al. 2010), we have limited understanding regarding the mechanisms through which online ratings occur, or the representativeness of these ratings. Understanding the mechanisms underlying the rating process is crucial to determining the meaning and value of online rating systems. We build on this literature by combining online physician rating data with patient surveys conducted offline. These data are used to understand not only which physicians receive online ratings, but also the extent to which these ratings are representative of the average patient population. 2.2 Online Word-of-Mouth Online reviews of products and services are part of the broader phenomenon of online word-of-mouth (WOM). WOM has a decades old tradition of being acknowledged as an important determinant of consumer choice (Arndt 1967; Bass 1969; Butler 1923; Katz et al. 1955). Consumers often ask friends and colleagues to recommend restaurants when they dine in unfamiliar areas and consult users of a product about their experience before buying a similar item. Until about ten years ago, WOM was, by its very nature, generally limited to individuals with close social connections (Brown et al. 1987) and/or close physical proximity. However, in recent years, the internet has enabled online communities of consumers to exchange experiences and opinions. As a result, the past decade has witnessed an increasing number of studies examining the phenomenon of online consumer reviews (Clemons et al. 2006; Dellarocas et al. 2004; Dellarocas et al. 2010; Duan et al. 2008; Godes et al. 2004; Li et al. 2008; Moe et al. 2010; 6 Zhu et al. 2010). Although a complete review of this literature is beyond the scope of this paper, we briefly summarize key findings and highlight gaps in understanding that our study seeks to address. Two conclusions can be drawn from a review of the literature on digital WOM. First, we note that a majority of these studies focus on establishing the value proposition of digital WOM by relating consumer reviews to financial measures. In this work, consumer reviews have been examined from various perspectives, including the dispersion of WOM (Godes et al. 2004), the distribution of the ratings (Clemons et al. 2006), product segment (Dellarocas et al. 2010), and consumer characteristics (Zhu et al. 2010). As one example, a recent study by Chintagunta et al. (2010) investigates the effects of the valence, volume, and variance of national online user reviews on the box-office performance of feature films and find stronger effects for review valence rather than volume. All these studies have provided important insights into the relationship between online consumer reviews and sales, however, they have stopped short of investigating the value of the information in these ratings, which is the focus of this study. Second, the majority of these studies have been conducted for consumer products such as TV shows (Godes et al. 2004), books (Chevalier et al. 2006), movies (Dellarocas et al. 2010; Duan et al. 2008), games (Zhu et al. 2010) and beauty products (Moe et al. 2010). Notably lacking are studies focused on consumer reviews for professional services, where there exists even greater information asymmetry on provider quality. Although healthcare is the biggest business sector of the US economy and accounts for 16% of the GDP, only a handful of studies have begun to explore this nascent phenomenon of physician online ratings (Gao et al. 2010; Lagu et al. 2010). The emphasis in this early work has been on understanding and describing an emerging trend, and these studies tend to be descriptive in nature. No work we are aware of has 7 investigated the question of whether online reviews of physicians are subject to bias and undertaken systematic research to provide quantitative estimates of the bias. 3. Theory and Hypotheses 3.1 The Nature of Bias in Online Reviews A distinctive feature of online ratings is that they are voluntarily posted by users, i.e., they reflect intentional public disclosure of privately held information. The existence of a variety of biases in such public reporting of consumption and trading experiences is acknowledged and well-documented in the extant literature. Biases are typically a result of the underlying social motivations and preferences of reviewers. One bias arises as a result of the social context of online reviews. Reviewers who post their product judgments do so in a context where others observe them and possibly use the information to draw conclusions about the reviewer, thereby creating social pressure for the focal reviewer. Studies indicating that perceived social pressure influences ratings beyond what the consumer privately experiences include Moe and Trusov (2010) and Wang, Zhang, and Hann (2010). Moe and Trusov argue that, via mechanisms related to social influence, the valence of previously posted ratings affects future ratings behavior, and that these ratings dynamics, in turn, influence sales. Wang, Zhang, and Hann (2010) posit that the social pressure experienced by raters from their online “friends” affects ratings. After controlling for homophily effects, they find that social influence is most potent when earlier ratings by friends are negative. Their study, conducted in the context of books, further reveals that the strength of social influence is conditioned by a variety of factors such as the popularity of the book, stage of the review cycle, experience of the user, and the size of the user’s social network. Thus, to the degree that social 8 influence modifies the truthful reporting of experienced product quality, it reduces social welfare by misleading users whose consumption choices are driven by the online reviews. Researchers have also argued for and demonstrated the existence of temporal effects in ratings. Early experimental work by Schlosser (2005) revealed that the presence of negative reviews results in downward adjustments in product evaluations for future review contributors, causing them to not reveal their true assessments. She attributes such behavior to rater psychology whereby those holding a negative opinion are perceived to be more intelligent and discriminating than those with a positive evaluation. Temporal effects on ratings can be potentially problematic for consumers of the reviews, even when the veracity of the rating is not compromised and it reflects the reviewer’s true experience. Arguing that the preferences of those who purchase early are systematically different from late buyers and hence, are a source of bias, Li and Hitt (2008) show that early consumer reviews of books on Amazon.com are overwhelmingly positive, and that subsequent purchases do not make adjustments for this positive bias when drawing inferences about product quality. Self-selection into early purchase caused by ex ante preference differences has implications for future demand. Hu, Pavlou, and Zhang (2006) isolate two biases in online ratings that together result in a J-shaped distribution for product reviews, where an overwhelming majority of ratings are at the most positive end of the scale (5 stars), a small number at the low end (1 star), and very few in between the two extremes. Noting that the shape of this distribution is inconsistent with the expectation of a normal curve with a large number of reviews, they characterize the selfselection biases that lead to the J-shaped distribution as a purchasing bias and an under-reporting bias. The former bias, echoing the notion of heterogeneity in ex ante consumer preferences for products identified by Li and Hitt (2008), arises because product purchases are made by those 9 with higher product valuations, thereby inflating the average rating for a product. The underreporting bias results from a higher propensity to rate a product amongst consumers that experience extreme satisfaction or dissatisfaction, as compared with those who like or dislike the product only moderately. Hu et al. (2009) conclude that the average rating is not an accurate reflection of true product quality and recommend that consumers examine the entire rating distribution. To summarize, multiple underlying causes have been implicated in the bias observed in online reviews. Causes of bias include systematic product preference differences among those who elect to provide an online review and those who do not, the social context within which reviews occur, such as the size of an individual’s friendship network, and the valence and volume of prior reviews. To an extent, the presence of such bias is inevitable; it is important, however, for researchers and policymakers to understand and circumvent the information loss caused by the bias. The predominant focus in prior work has been on self-selection into providing a review, rather than systematic differences related to the object of the review. We address this gap by investigating two biases in the context of physician ratings. The first, a selection bias, is related to consumers’ choice of what physicians to rate online. Drawing on prior findings and related theory from marketing and psychology, we offer two alternative predictions with distinct underlying theoretical mechanisms for which physicians receive ratings online: a “sound of silence” explanation versus a “naming and shaming” explanation. The second bias we explore is in the choice of what opinions to express online. Labeling this bias the “hyperbole effect,” we predict that online reviews of doctors are less informative at the two ends of the population opinions. Specific hypotheses related to these biases are developed below. 10 3.2 “Naming and shaming” vs. “Sound of silence” in physician selection Consumers volitionally choose to go online and post reviews of their physicians. In the absence of distortions in user choices about which physicians to rate online, those who are rated online should be a representative sample of the underlying population. Yet, prior research in marketing has extensively documented the phenomenon of customer complaining behavior (CCB) in both online and offline settings (Bearden et al. 1979; Cho 2002; Cho et al. 2001). Singh et al. (1988) define CCB as “a set of multiple behavioral and non-behavioral responses, some or all of which are triggered by perceived dissatisfaction with a purchase episode.” Prior research has also suggested that consumers’ propensity to complain is positively associated with their level of dissatisfaction, the importance of the “purchase,” the relative benefit-cost ratio of complaining, and their personal competence (Bearden et al. 1979; Cho 2002; Cho et al. 2001; Landon 1977). Early work by Richins (1983) established that minor dissatisfaction evokes minimal response from consumers, while more severe dissatisfaction was strongly correlated with the consumer response of “telling others.” To the extent that physicians who are lower in quality (as judged by patients) are more likely to evoke dissatisfaction, we can expect more CCB for this group of physicians. Indeed, critics worry that internet rating sites for physicians will become forums for disgruntled patients to vent their anger (McCartney 2009; Miller 2007). If this “naming and shaming” assumption is correct, then we would expect to see a disproportionately higher number of low quality physicians being rated online. In this instance, the information value of the ratings for consumers is considerably degraded: even if a physician is rated high compared to others, consumers still need to exercise extra caution when making inferences about the quality of that physician, because s/he is the “best of a bad lot.” Thus, we predict: 11 H1a: Online ratings of physicians are subject to the phenomenon of “naming and shaming” in that physicians of lower quality are more likely to be rated online. Drawing on findings related to CCB, we have argued that dissatisfied patients are more likely to complain, yet research also exists to suggest that consumers are much less likely to speak out about a negative experience (Dellarocas et al. 2008). Therefore, an alternative possibility exists that online ratings yield a “sound of silence” effect, in which low quality physicians receive fewer ratings because individuals prefer to keep their damaging evaluations private. In their study of eBay’s feedback mechanisms Dellarocas and Wood (2008) characterize a reporting bias in which the propensity to reveal private opinion is a function of the outcome experienced by the reviewer. When the outcome is negative, traders choose not to provide low ratings for fear of retaliation from trading partners. Their empirical results show that eBay traders are significantly more likely to report satisfactory outcomes than those that are moderately unsatisfactory. The limited presence of negative WOM is further corroborated by a number of other studies. Chevalier and Mayzlin (2006) find that consumer reviews of books are strikingly positive at Amazon.com and Barnedandnoble.com. East et al. (2007) review 15 studies and report that positive WOM is more pervasive than negative WOM in every case, with positive ratings/reviews exceeding negative ones by a factor of 3. Hu, Pavlou, and Zhang’s (2006) identification of the J-shaped distribution of product ratings is, similarly, an outcome of a disproportionately large number of positive valuations. A patient could be motivated to remain silent after a dissatisfying experience with a physician for a variety of reasons. It is well recognized that the patient-physician relationship is characterized by substantial information asymmetry where the physician, by virtue of specialized 12 knowledge and training, is the “expert,” and the patient has limited understanding of medical procedures, diagnoses, and treatments. Moreover, society accords physicians a unique status because of the nature of their work: saving lives. In such situations, the patient experiences lower power and competence and may attribute the negative experience to their own shortcomings. When such an attribution of blame to self occurs, the patient is less likely to report the negative experience. Moreover, unlike the consumption of goods such as movies or books where there is typically no personal relationship between sellers and buyers, the patient-physician relationship is generally deeply personal and enduring. Insurance requirements and other logistical barriers create significant switching costs to move from one physician to another. In the patient’s mind, public reporting of a negative experience may evoke the fear of physician retaliation, much as in the case of trading partners in an online auction, albeit in a different form. Physicians’ reprisals could take the form of, at best, interminable delays in getting the next appointment, or, at worst, litigation for public libel. Indeed, in the highly litigious profession of medicine, the latter form of reprisal is not implausible. Under the assumption of this form of behavior where patients experiencing negative physician encounters volitionally remain silent, we would expect high quality physicians to be more likely to be rated online. Following this logic we predict: H1b: Online ratings of physicians are subject to the “sound of silence” phenomenon in that physicians of lower quality are less likely to be rated online. We note that H1a and H1b indicate contrasting predictions, yet it is possible that both biases co-exist. Prior literature or theory has not identified which bias is more dominant. We provide empirical insights into whether the “sound of silence” effect or the “naming and shaming” effect prevails in online physician ratings. 13 3.3 Hyperbole Effects The second bias arises from the valence of the rating itself. Substantial prior research has documented that individuals are more likely to talk about extreme experiences (Anderson 1998; Dellarocas et al. 2006; Hu et al. 2009). Anderson (1998) argued that the marginal utility a consumer derives from WOM activity increases as satisfaction or dissatisfaction with the product increases, resulting in a U-shaped curve for the valence of WOM activity. Empirical data from a cross-cultural study of customer satisfaction in the United States and Sweden supports the posited functional form (Anderson 1998). Theoretically, the sharing of positive experiences has been ascribed to a variety of motivations, including altruism (providing useful information for the benefit of others), instrumental motives (the desire to appear well informed), and general cognitive biases that favor positive incidents (Arndt 1967; Dichter 1966; Holmes et al. 1977). Motivational mechanisms have also been implicated in high levels of dissatisfaction giving rise to negative WOM: Hu, Pavlou, and Zhang (2009) label this “moaning” about the experience. Consumers who are highly dissatisfied may desire to express their hostility (Jung 1959), or seek vengeance and retribution (Richins 1983). Cognitively, consumers who are highly satisfied or highly dissatisfied are argued to derive homeostase utility from communicating their experiences: “expressing positive emotions and venting negative feelings” (Henning 2004). Homestase utility is predicated on the fundamental tenets of balance theory or homeostasis, suggesting that individuals strive for balance and will seek to restore equilibrium after an “unbalancing experience.” Highly positive and highly negative (i.e., extreme) consumption experiences threaten the individual’s sense of balance and motivate them to externalize their feelings by expressing opinions about the consumption experience, which will then restore balance. We label this the “hyperbole” effect in 14 online ratings. Since physicians at the ends of the quality spectrum are more likely to evoke extreme opinions or hyperbole, their ratings should contain less information value in reflecting the population opinion. We predict: H2: Online physician ratings exhibit the “hyperbole effect,” such that the online ratings for high-end and/or low-end physicians are less informative in reflecting population opinions than the ratings for average physicians. 4. Methodology and Results 4.1 Data We construct the data set for empirical testing of the research hypotheses by matching observations across four different datasets. The first dataset is an offline patient survey conducted by the consumer advocacy group Consumers’ Checkbook, in collaboration with five major health plans (Aetna, UnitedHealthcare, CIGNA HealthCare, BlueCross and BlueShield of Kansas City, and BlueCross BlueShield of Tennessee). The questionnaire and survey procedure were developed and tested by the U.S. Agency for Healthcare Research and Quality and endorsed by the National Quality Forum, to ensure the representativeness of patient sample for each physician. Patients rate physicians on a scale of 1- 100. The survey was conducted in three major metropolitan areas within the United States (Denver, Memphis, and Kansas City). As with most surveys, the representativeness of the offline patient data could be a matter of concern; however, the rigor with which the Consumers’ Checkbook survey was conducted serves to mitigate this issue. After pooling health plan claims data for physicians working with the major health plans engaged in this effort, surveys were sent, on average, to 145 patients of each physician (minimum 113). Roughly 50 surveys per physician were returned (a response rate 15 ranging from 32% in Denver to 42% in Kansas City) yielding coverage of 99.14% of the physicians investigated. A second dataset (the online ratings of physicians) is drawn from RateMDs.com, one of the nation’s largest aggregators of consumers’ online physician ratings. Established in 2004, RateMDs.com has accumulated over 1 million doctor ratings. According to a recent comparison of physician rating website, RateMDs.com has the most ratings with patient narratives among major players (Lagu et al 2010). The third dataset is the 2007 Economic Census (conducted by the US Census Bureau), from which we gather information regarding the population and economic conditions of the areas in which the physicians practice. The Economic Census collects quinquennial data at the county level relating to population income, poverty, demographics, and the emergence of new firms along with economic indicators regarding the general fiscal health of the area. It is considered the definitive source for information related to geographic variation in economic conditions. The final data source comes from the state medical board websites that are responsible for physician accreditation, licensing, and disciplinary oversight. Medical boards collect comprehensive information on doctors that practice in the respective states and we use their data for e physician characteristics such as board certification, physician experience, and specialty. Our sample is constructed to reduce sample heterogeneity and control for potential unobserved variable bias. Because consumer choice related to specialist physicians for particular procedures (such as brain surgery) occurs less frequently, we restrict our sample to general practitioners and family care physicians, whose services are used by all types of patients. The resulting dataset is composed of 1,425 physicians with completed offline surveys, among which 16 794 physicians who have been rated online. We next introduce our specific variables of interest and discuss their operationalization. Table 1 provides a summary of the variables. Our empirical strategy uses two different dependent variables. To determine the propensity to receive an online rating we use a dichotomous measure equal to one if a physician is Rated Online at least once and zero otherwise. This variable does not, however, directly measure the information value of the online ratings. To examine information value, we use the average Online Rating conditional upon being rated online at least once. Although the results are not reported, we also examined the number of online ratings (Rating Count) to test the robustness of our conclusions. The independent variables of interest comprise patient survey results administered by Checkbook. Our primary measure is the physician’s average survey score, Survey Rating, based on all patients who participated in the offline survey. We also control for the number of surveys completed, Surveys Filled. The online and population ratings are subjective evaluations of quality from the patient’s perspective. As discussed earlier, there is considerable controversy and ambiguity surrounding the measurement of physician clinical quality and the appropriate objective metrics for quality assessment. Lacking appropriate clinical training and expertise required to evaluate the true quality of medical care, it may be the case that consumers’ opinions simply reflect the quality of factors such as inter-personal interaction (e.g. bedside manner, communication skills, and punctuality). These assessments constitute useful information for future consumers, and evidence suggests that they guide consumers’ choice processes (Fanjiang et al. 2007). Additionally, the patient perspective is being increasingly used as a measure of the quality of care, as reflected in the HCAHPS initiative endorsed by the National Quality Forum3. Commencing Oct 2012, patient satisfaction will also impact Medicare reimbursement (Rangel 3 http://www.hcahpsonline.org/home.aspx 17 2010). Finally, patients’ satisfaction with the relationship between them and their healthcare provider has been argued to have positive effects on clinical outcomes s (Kane et al. 1997). In addition to these variables, we control for a number of physician specific characteristics that may influence both the propensity for the physician to receive an online rating as well as the score received. These include dichotomous indicators physician Gender (equal to one if male and zero otherwise) and physician certification by the state medical board (Board). We control for the number of years the physician has practiced medicine, Experience, which is the number of years elapsed from medical school graduation to the current year. We control for the number of times the physician received endorsements (Peer Rating) from physicians in another Checkbook survey. Because physician ratings may be influenced by broader market dynamics, we also control for a series of economic factors. These market variables are operationalized at the county level. First among these is the Population of the county in which the physician practices. Moreover, to account for both the geographic and digital competition the physician is facing, we control for the number of physicians who have received Survey Ratings in the focal physician’s ZIP code, Physician Count, as well as the number of physicians in the focal physician’s ZIP code who have received an Online Rating, Rated Physician Count. As a physician’s patients may also be subject to economic constraints we also control for the Median Income of the populace in the county in which the physician practices. Finally, we control for the level of urbanization in the city in which the physician practices (Urban and Large Urban indicating municipalities with more than three or ten ZIP codes, respectively), as well as dummy variables indicating the metropolitan area (Memphis and Denver; Kansas City serves as the excluded category). 18 Summary statistics for all variables are available in Table 1 and a correlation matrix is presented in Table 2. 4.2 Analysis Likelihood of Online Rating Despite the recent rapid growth in online consumer ratings of physicians, a substantial portion of doctors have yet to receive a consumer generated online rating. To the extent that this variation may be driven by factors uncorrelated with physician quality, the absence of a rating would provide no information regarding physician quality. Alternatively, selection into online ratings could reflect the perceived quality of physicians. Hypotheses 1a and 1b offer alternative explanations for online rating selection, where a physician’s likelihood of being rated online is affected by her quality. We first explore our data visually. Figure 1 describes the Survey Rating distributions for physicians with and without online ratings. On average, Survey Rating for physicians with at least one online rating are higher than Survey Rating for other physicians. These descriptive patterns are inconsistent with Hypothesis 1a (naming and shaming), but provide some support for Hypothesis 1b (sound of silence). Based on the initial evidence from Figure 1, we formally test Hypotheses 1a and 1b by modeling the probability that a physician receives an online rating as: . (1) Where y is an indicator equal to one if a physician is rated and equal to zero otherwise. The variable s1 is the physician’s Survey Rating and s2 is the number of surveys filled, while X and M are vectors of physician and market characteristics, respectively. The terms {α1,β1,γ1,δ1,θ1} are parameters to be estimated and ν represents an error term. The specification in (1) is estimated 19 using logistic regression. The first column in Table 3 reports parameter estimates for Equation 1. We find that the coefficient of Survey Rating is 0.0135, and significant at p<0.10 level. This suggests that higher Survey Ratings increase the probability of receiving an online rating, therefore supporting the “sound of silence” effect. In our baseline specification, we assume a linear relationship between Survey Rating and the likelihood of being rated online. We next relax this restriction by allowing the coefficient of Survey Rating to vary across its own distribution. The second specification in Table 4 builds on Equation 1 to allow probability of receiving an online rating to vary across the lower, middle (the excluded category), and upper terciles of Survey Rating. The third model allows the relationship to vary across quartiles of Survey Rating (where the third quartile is the default group). Interestingly, we see a significant negative correlation between the lowest segment indicator and the probability of being rated online. The lower tercile coefficient is -0.342 (p<0.05), and the lower quartile coefficient is -0.408 (p<0.05). Evaluated at the average values of other variables, a physician in the bottom quartile has 10.1 percentage point reduction in the probability of being rated online compared to an average physician. Moreover, we do not detect a significant difference in the probability of being rated online in other ranges of Survey Rating. In other words, high quality physicians are equally likely as average physicians to be rated online. This provides strong evidence for rejecting Hypothesis 1a and supports the dominance of the “sound of silence” effect. Value of Online Ratings Next, we examine how well the online rating reflects the underlying quality perceived by the broader patient population, conditional on the physician being rated online. We first conduct 20 an overall assessment of the correlation between the two types of ratings, and then formally test whether we observe hyperbole effects toward extremes of the quality spectrum. Figure 2 presented the distribution of Online Rating for physicians with relatively low, medium, and high Survey Rating. A number of interesting observations can be made from these data. Overall, we find that physicians with higher Survey Rating receive higher Online Rating. For example, physicians in the highest tercile of Survey Rating overwhelmingly receive an Online Rating of 5. On the other hand, high ratings are less prevalent for physicians in the middle and lowest tercile. This provides evidence that Online Rating are positively correlated with Survey Rating and that the former do provide useful information for consumers. We test this hypothesis using a variety of regression techniques. We model the value of Online Rating (R) as: . (2) We estimate simple, linear, specifications of s1, reported in columns 1 through 3 in Table 4. Column 1 of Table 4 reports the OLS estimation of Equation (2). We detect a strong and significant correlation between the Survey Rating and Online Rating (p<0.001). A 10-point increase in the Survey Rating (measured on a 100 point scale) leads to 0.469 point increase in the Online Rating (which is measured on a 5 point scale). As the range of Online Rating is truncated and the distribution is skewed we employ alternative estimation techniques. Column 2 reports the results from the Tobit specification. As expected, the magnitude is even greater in the Tobit estimate (0.0732, p<0.01), indicating that Online Rating are informative and reflect population opinions. The ordered logit yields almost the same finding as that of the Tobit model, both in magnitude and statistical significance. This not only confirms the positive relationship, but also 21 suggests that it is reasonable to use the value of the ratings for further analysis; however, these specifications lack the flexibility needed to test for bias or exaggeration in Online Rating. Hypothesis 2 posits that the online ratings should contain less information value in discriminating among individual physicians based on quality for those with particularly high or low Survey Ratings. To test this hypothesis, we need to examine the relationship between online and offline ratings in a disaggregated fashion, allowing for variation in this relationship across the population opinion. Thus, we incorporated splines of Survey Rating, denoted by ; , in a series of regressions. These regressions explore heterogeneity in the relationship between Online Rating and Survey Rating across the distribution of Survey Rating. Splines create a piecewise specification where the functions are spliced together at pre-defined intervals based on the distribution of Survey Rating values (Robb 1980; Suits et al. 1978). Linear splines impose a linear specification on parameter values between knots while cubic splines allow for more flexible relationships.4 These models create local flexibility without creating discontinuities (Kennedy 2003). ; . (3) Results are presented in Table 5. Columns 1 through 3 include linear splines of Survey Rating and Columns 4 through 6 use cubic splines. Specification 1 of Table 6 is estimated using OLS and generates separate parameter estimates for three segments of physicians’ Survey Rating: the lower quartile, the middle half, and the top quartile. The Survey Rating coefficient is 0.0397 for the lower quartile, and significant at p<0.01 level. This means that a 10-point increase in Survey Rating leads to 0.397 point increase in the Online Rating. Interestingly, the Survey Rating coefficient increases to 0.606 (p<0.01) in the middle half segment. This is 52% higher than the 4 Knots are set at 75 and 86 for linear spline Models. Cubic spline model knots were set at 66 and 79. Cubic spline knots are based on the distribution of Survey Rating following Harrell (2001, 23). 22 lower quartile Survey Rating coefficient, implying that Online Rating is sensitive to Survey Rating near the middle of the distribution. We do not, however, find a significant correlation between Survey Rating and Online Rating in the highest quartile of Survey Ratings where the coefficient is relatively small (β1 = 0.0167, p = 0.39), suggesting that online ratings are not sensitive to survey ratings at the high end of the distribution. Identifying the information value of Online Rating depends on jointly examining the intercept, slopes, and confidence intervals. Consequently, we plot the predicted values of Online Rating for ease of interpretation. In an ideal scenario, Online Rating would accurately reflect Survey Rating; thus, it is useful to compare the predicted values with a theoretical scenario with a linear mapping between the two ratings, which would have an intercept of 1 and a slope of 0.04. Predicted values and confidence intervals from Column 5, Table 5 are plotted in Figure 3, along with the theoretical perfect information line. Figure 3 demonstrates that there is a positive correlation between Online Rating and Survey Rating at the low end. The slope doubles between the first and second knots and is significantly different from the perfect information line. The slope then falls again in the final segment, with the predicted values remaining significantly higher than the perfect information line. Note that the prediction is based on a Tobit model. Because the observed online ratings reach a ceiling at the high end boundary of 5 out of 5, users will not be able to discern differences among physicians with predicted values exceeding 5. Overall, we find that Online Rating are relatively effective at distinguishing physicians with average quality. However, Online Rating have less discerning power in separating low end physicians, and almost no ability to distinguish quality variation within the top quartile of Survey 23 Rating (where most online ratings are capped at 5 out of 5). With these findings, we find support for H2 and conclude that online ratings of physicians are subject to the “hyperbole” effect. 4.3 Robustness The conclusions drawn in Section 4.2 are subject to a number of assumptions. The relationship between Online Rating and Survey Rating is plausibly affected by several factors; for example, continuous and Likert scales usually yield different distributions. We therefore performed a wide range of specification tests to explore this issue. First, we employed several different estimation strategies as demonstrated in Table 4. Second, we estimated alternative spline functions to those reported in Table 5, choosing different numbers of knots and changing knot locations. Third, we used alternative specifications of f in Equation 3, including polynomial and categorical Survey Rating measures. Every specification supported the two crucial features of Figure 3: 1) the predicted line is monotonically increasing in Survey Rating; and 2) Online Rating is exaggerated for the upper tail of Survey Rating. One remaining concern is that the Online Rating equations suffer from selection bias based on individual consumers’ experiences. While Survey Rating may be an unbiased measure of average perceived quality, a consumer’s rating is undoubtedly sensitive to their individual experiences. This potential form of selection would be a concern if we were attempting to measure the causal effect of Survey Rating on consumer experiences5. Our objective, however, is to measure if, and when, Online Rating accurately reflects Survey Rating, irrespective of the mechanism driving such variation. 5 One approach is to employ a Heckman selection model to jointly estimate selection into being rated online and the value of Online Ratings. Potential instruments include geographic variation in the propensity for receiving an Online Rating. However, this approach should be applied with caution given that the instruments are weak and the data do not fit the distributional assumptions of the selection correction model. 24 5. Discussion and Conclusion In this paper, we examined the information value of online physician ratings. Based on a review of related literature, we identified two potential sources of bias inherent in online ratings: the selection of physicians to rate, and the selection of opinions to express. We then derived formal hypotheses related to these two biases, which were subsequently tested based on a unique dataset of 1,425 primary care physicians in three metropolitan areas: Denver, Kansas City, and Memphis. This study represents one of the first efforts in quantifying the information value of online ratings. We have three notable and interesting findings. First, physicians with low survey ratings are less likely to be rated online, thereby supporting the predicted existence of the “sound of silence” effect. Second, we find there is a strong correlation between the online ratings and offline population opinion. Third, the association is strongest in the middle of physician quality spectrum, and less strong in the high end. These results suggest that online ratings are more informative of differences between physicians in average and low quality segments, but online ratings are less effective in differentiating quality among high quality physicians. Given that patients are in greater need of avoiding bad physicians than they are of finding the very best physicians, these findings are encouraging. They suggest that the information provided by online ratings has substantial social welfare implications. The evidence presented in this study - that online opinions of products do indeed reflect population opinions - constitutes an important contribution to the body of work on online reviews. Although prior work has alluded to the presence of a variety of selection biases in online ratings and digital WOM (Hu et al. 2006; Li et al. 2008), none of these studies have explored a possible correlation between the opinions of online raters and offline consumers. 25 While this finding may appear intuitive, the nature of the self-selection issues that have been discussed and empirically demonstrated in the realm of online ratings is sufficient to make researchers question this ex ante assumption. Theoretically, this study contributes to extant understanding of the information value of digital WOM by proposing two new hypotheses related to the nature of bias in online reviews. We find evidence of a dominant role for the “sound of silence” effect. While the “sound of silence” has been demonstrated in other contexts, most notably in investigations of the online retailer eBay (Dellarocas et al. 2008), its existence in environments where agents cannot utilize direct retaliation strategies has yet to be confirmed. One area of fruitful future research stemming from this study may be investigation into the factors that delineate groups exhibiting the “sound of silence” from those which engage in “naming and shaming.” A second theoretical advancement lies in the exploration of the hyperbole effect in the context of online physician reviews. Although prior literature has shown hyperbole to be an effect in online reviews, this study is the first to empirically demonstrate the effects of hyperbole on the ability of consumers to delineate quality. We were able to identify the areas in the distribution where consumers may glean the most and the least actual value and insight from the differences in ratings. In this respect we have not only demonstrated the value of the online rating but also identified where ratings provide limited information content in distinguishing physicians from one another. Additionally, this study contributes to the emerging health IT literature. Most studies have focused on IT in hospital settings and investigated questions related to adoption or payoffs (e.g., Devaraj and Kohli 2003, Goh, Gao, and Agarwal 2011, Kane and Labianca 2011). With the infusion of health IT into the consumer domain, there is a great need to expand the realm of 26 health IT research (Agarwal et al 2010). One of the most dynamic and rapidly evolving areas is how the Internet, especially the Web 2.0 tools, affects health. Studies have examined how patient seek information on social networks (Yan and Tan 2010). We investigate the other side of the question---how patients decide to share their assessment of physicians online, and how the opinions are biased (Lee et al 2006). Two major implications for policy and practice follow from our findings. First, as noted previously, the AMA and other physician advocacy groups have been the staunchest opponents of the online aggregation of physician ratings. While their concerns that patients lack the training to evaluate physician quality remain a matter of speculation, as well as an avenue of future investigation, the concern that ratings aggregation sites will become digital soapboxes for disgruntled patients appears to be unfounded. Not only are the user ratings overwhelmingly positive on these websites, but the reflection of offline population opinion in the online reviews suggests that these reviews are informative. The second implication for policy, which becomes apparent from this investigation, is the pressing need for a national repository for online ratings. Much of the recent discourse in healthcare transformation is focused around the notion of a patient-centric healthcare system (Krist et al. 2011), in which the patient is viewed as an informed and engaged actor in managing his/her healthcare and well-being. To the degree that such a vision demands transparency and easily accessible information to drive patient decision making in consequential areas such as physician choice, and because studies suggest that patients value the opinions of others more than clinical quality measures (Fanjiang et al. 2007), repositories can facilitate important comparisons and trade-offs. These data may be complementary to clinical quality measures proposed for the PQRS. 27 Two limitations of this study represent fruitful opportunities for future work. First, our findings are based on primary care physicians, and therefore one should exercise caution in generalizing results to other services. Extending this study to different specialties in healthcare services such as cardiologists and obstetricians would provide evidence on the robustness of our findings. Extending the study to additional professional services would further illuminate the existence and nature of biases in online ratings of this important category of services. Second, this study investigates the bias associated with online ratings by comparing them with those provided in offline surveys of patients. The offline survey was conducted using a scientific approach with sufficient sample size and reasonable response rate, thereby constituting a reasonable basis for identifying bias introduced by the online setting. Nevertheless, survey opinions might be subject to the same selection biases discussed in Section 3. We note that if this were indeed the case, the presence of offline bias would have tended to eliminate or at least diminish difference between the online ratings and offline surveys. The fact that we observe difference makes our findings more robust. Finally, patient surveys provide important insights into physician quality and are increasingly emphasized in recent policy initiatives as important indicators of physician performance. A natural extension of this work is to examine the relationship between online ratings and physicians’ clinical quality, which we leave to future research. In conclusion, this paper provides one of the first empirical studies of the information value of online ratings in reflecting the underlying population opinions. We advance knowledge of the nature of the online ratings by quantifying the biases associated with the propensity of being rated online, as well as the value of the rating received. If the growth and effects of online reviews in other product categories are a good bellwether, we can only expect consumers to 28 increasingly turn to the internet for making choices related to professional services. And as internet penetration increases, consumer generated ratings will likely continue to grow and become an even more prominent information channel in critical sectors of our society such as healthcare. This naturally raises the question of what type of caution consumers need to exercise when making consumption choices based on online reviews. The findings from this study yield important insights into how and when online ratings can be used to make physician selection decisions that are more informed. 29 References Agarwal, R., Gao, G., DesRoches, C., & Jha, a K. “Research commentary--The digital transformation of healthcare: current status and the road ahead,” Information Systems Research, 2010, 21(4), 796-809. Anderson, E.W. "Customer Satisfaction and Word of Mouth," Journal of Service Research (1:1), August 1, 1998 1998, pp 5-17. Arndt, J. "Role of Product-Related Conversations in the Diffusion of a New Product," Journal of Marketing Research (4:3) 1967, pp 291-295. Arrow, K.J. "Uncertainty and the Welfare Economics of Medical Care," The American Economic Review (53:5) 1963, pp 941-973. Bacon, N. "Will doctor rating sites improve standards of care? Yes," BMJ (338), January 1, 2009 2009. Bass, F. "A new product growth model for consumer durables," Management Science (15:5) 1969, p 13. Bearden, W.O., and Mason, J. "Elderly Use of In-Store Information Sources and Dimensions of Product Satisfaction/Dissatisfaction," Journal of Retailing (55:Winter) 1979, p 12. Brown, J.J., and Reingen, P.H. "Social Ties and Word-of-Mouth Referral Behavior," The Journal of Consumer Research (14:3) 1987, pp 350-362. Butler, R. "Marketing and Merchandising," Alexander Hamilton Instite, New York, NY, pp. 2021. Chen, Y., and Xie, J. "Third-Party Product Review and Firm Marketing Strategy," Marketing Science (24:2) 2005, pp 218-240. Chevalier, J.A., and Mayzlin, D. "The Effect of Word of Mouth on Sales: Online Book Reviews," Journal of Marketing Research (JMR) (43:3) 2006, pp 345-354. Chintagunta, P.K., Gopinath, S., and Venkataraman, S. "The Effects of Online User Reviews on Movie Box Office Performance: Accounting for Sequential Rollout and Aggregation Across Local Markets," Marketing Science (29:5) 2010, pp 944-957. Cho, Y. "An Analysis of Online Customer Complaints: Implications for Web Complaint Management," 2002, pp. 176b-176b. Cho, Y., Im, I., Hitlz, S., and Fjermestad, J. "Causes and Outcomes of Online Customer Complaining Behavior: Implications for Customer Relationship Management (CRM)," Americas Conference on Information Systems, 2001. Christianson, J., Volmar, K., Alexander, J., and Scanlon, D. "A Report Card on Provider Report Cards: Current Status of the Health Care Transparency Movement," Journal of General Internal Medicine (25:11) 2010, pp 1235-1241. Clemons, E.K., Gao, G.G., and Hitt, L.M. "When Online Reviews Meet Hyperdifferentiation: A Study of the Craft Beer Industry," Journal of Management Information Systems (23:2), Fall2006 2006, pp 149-171. Dellarocas, C., Awad, N., and Zhang, X. "Exploring the Value of Online Reviews to Organizations: Implications for Revenue Forecasting and Planning," International Conference on Information Systems, 2004. Dellarocas, C., Gao, G., and Narayan, R. "Are Consumers More Likely to Contribute Online Reviews for Hit or Niche Products?," Journal of Management Information Systems (27:2), Fall2010 2010, pp 127-157. 30 Dellarocas, C., and Narayan, R. "A Statistical Measure of a Population's Propensity to Engage in Post-Purchase Online Word-of-Mouth," Statistical Science (21:2) 2006, pp 277-285. Dellarocas, C., and Wood, C.A. "The Sound of Silence in Online Feedback: Estimating Trading Risks in the Presence of Reporting Bias," Management Science (54:3), March 1, 2008 2008, pp 460-476. Dichter, E. "How Word-of-Mouth Advertising Works," Harvard Business Review (44:6) 1966, p 147. Dolan, P. "Patients Rarely Use Online Ratings to Pick Physicians," in: American Medical News, American Medical Association, Washington DC, 2008. Duan, W., Gu, B., and Whinston, A.B. "Do online reviews matter? -- An empirical investigation of panel data," Decision Support Systems (45:4) 2008, pp 1007-1016. East, R., Hammond, K., and Wright, M. "The relative incidence of positive and negative word of mouth: A multi-category study," International Journal of Research in Marketing (24:2) 2007, pp 175-184. Fanjiang, G., von Glahn, T., Chang, H., Rogers, W., and Safran, D. "Providing Patients Webbased Data to Inform Physician Choice: If You Build It, Will They Come?," Journal of General Internal Medicine (22:10) 2007, pp 1463-1466. Fox, S., and Jones, S. "The Social Life of Health Information," Pew Research Center. Gao, G., McCullough, J., Agarwal, R., and Jha, A. "A Premier Study of Online Doctor Ratings," Workshop on Information Systems and Economics, St Louis, MO, 2010. Gawande, A. Complications: A Surgeon’s Notes on an Imperfect Science Metropolitan Books, New York, NY, 2002. Godes, D., and Mayzlin, D. "Using Online Conversations to Study Word-of-Mouth Communication," Marketing Science (23:4), Fall2004 2004, pp 545-560. Harris, K., and Buntin, M. "Choosing a Health Care Provider: The Role of Quality Information," Policy (1:14) 2008. Harrison-Walker, L.J. "The Measurement of Word-of-Mouth Communication and an Investigation of Service Quality and Customer Commitment As Potential Antecedents," Journal of Service Research (4:1), August 1, 2001 2001, pp 60-75. Henning, P. "Everyday Cognition and Situated Learning," in: Handbook of Researchon Educational Communications, 2004. Holmes, J., and Lett, J. "Product Sampling and Word of Mouth," Journal of Advertising Research (17:5) 1977, p 6. Hu, N., Pavlou, P.A., and Zhang, J. "Can online reviews reveal a product's true quality?: empirical findings and analytical modeling of Online word-of-mouth communication," in: Proceedings of the 7th ACM conference on Electronic commerce, ACM, Ann Arbor, Michigan, USA, 2006, pp. 324-330. Hu, N., Zhang, J., and Pavlou, P.A. "Overcoming the J-shaped distribution of product reviews," Commun. ACM (52:10) 2009, pp 144-147. Jha, A.K., and Epstein, A.M. "The Predictive Accuracy Of The New York State Coronary Artery Bypass Surgery Report-Card System," Health Affairs (25:3), May 1, 2006 2006, pp 844855. Jha, N., Premarajan, K.C., Nagesh, S., Khanal, S., and Thapa, L.B. "Five-star Doctors for the 21st Century," Journal of Health Management (7:2), October 1, 2005 2005, pp 237-247. Jung, C. "Concerning the archetypes, with special reference to the anima concept," Collected Works) 1959. 31 Kane, R.L., Maciejewski, M., and Finch, M. "The Relationship of Patient Satisfaction with Care and Clinical Outcomes," Medical Care (35:7) 1997, pp 714-730. Katz, E., and Lazarsfeld, P. Personal Influence The Free Press of Glencoe, Glencoe, IL, 1955. Kennedy, P. A Guide to Econometrics The MIT Press, Cambridge, MA, 2003. Krist, A., and Woolf, S. "A vision for patient-centered health information systems," JAMA:305) 2011, p 2. Lagu, T., Hannon, N., Rothberg, M., and Lindenauer, P. "Patients’ Evaluations of Health Care Providers in the Era of Social Networking: An Analysis of Physician-Rating Websites," Journal of General Internal Medicine (25:9) 2010, pp 942-946. Landon, L. "A Model of Consumer Complaint Behavior," in: Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 1977. LeGrow, G., and Metzger, J. "E-Disease Management," California HealthCare Foundation. Levine, J. "Welcoming Transparency: Some Physicians Find Merit in Online Ratings," The Journal of Perinatal Education (18:3) 2009, p 7. Li, X., and Hitt, L.M. "Self-Selection and Information Role of Online Product Reviews," Information Systems Research (19:4), December 1, 2008 2008, pp 456-474. Martin, R. "To Doctors' Dismay, Web Ratings Grow.," in: St Petersburg Times, St Petersburg, FL, 2009. McCartney, M. "Will doctor rating sites improve the quality of care? No," BMJ (338), January 1, 2009 2009. Miller, C. "What Would Hippocrates Do?," in: Forbes Magazine, 2007. Moe, W., and Trusov, M. "The Value of Social Dynamics in Online Product Ratings Forums," Journal of Marketing Research) 2010. Rangel, C. "Patient Protection and Affordable Care Act," in: 111-148, t.U.S. Congress (ed.), 2010. Richins, M.L. "Negative Word-of-Mouth by Dissatisfied Consumers: A Pilot Study," The Journal of Marketing (47:1) 1983, pp 68-78. Robb, L. "Accounting for Seasonality with Spline Functions," The Review of Economics and Statistics (62:2) 1980, p 3. Schlosser, Ann E. "Posting versus Lurking: Communicating in a Multiple Audience Context," The Journal of Consumer Research (32:2) 2005, pp 260-265. Singh, J. "Consumer Complaint Intentions and Behavior: Definitional and Taxonomical Issues," The Journal of Marketing (52:1) 1988, pp 93-107. Solomon, S. "Doc seeks $12 million for RateMDs comments," National Review of Medicine) 2008. Suits, D.B., Mason, A., and Chan, L. "Spline Functions Fitted by Standard Regression Methods," The Review of Economics and Statistics (60:1) 1978, p 8. Wang, C., Zhang, X., and Hann, I.-H. "The Value of Social Dynamics in Online Product Ratings Forums," in: Workshop on Information Systems Economics, St Louis, MO, 2010. Zhu, F., and Zhang, X. "Impact of Online Consumer Reviews on Sales: The Moderating Role of Product and Consumer Characteristics," Journal of Marketing (74:2) 2010, pp 133-148. 32 Figure 1. The Distributions of Survey Ratings for Physicians Rated and Unrated Online 30% 25% 20% Unrated 15% Rated 10% 5% 0% 30 35 40 45 50 55 60 65 70 75 80 85 90 95 33 Figure 2. Distribution of Online Ratings by Tercile of Survey Rating 90% Percent of physicians within tercile 80% 70% 60% 50% Lowest 40% Middle High 30% 20% 10% 0% 1 2 3 4 5 Online Rating 34 Figure 3. Predicted Online Ratings as a Function of Survey Rating with 95% Confidence Intervals Based on Tobit estimates with cubic splines (Column 5, Table 5). 35 Table 1: Summary Statistics Variable Mean Std. Dev. Online Ratings Rated Online (yes=1) 0.557 0.497 Online Rating value 4.021 1.215 Rating Count 1.772 2.749 Survey Ratings Survey Rating 79.782 8.199 Surveys Filled 55.232 15.828 Physician Characteristics Board 0.799 0.401 Gender 0.681 0.466 Experience 23.968 9.292 Peer Rating 0.164 0.918 Market Characteristics Population (in thousands) 518.123 273.477 Physician Count 15.588 13.790 Rated Physician Count 7.161 6.321 Urban 0.535 0.499 Large Urban 0.403 0.491 Median Income 47.212 10.912 Denver 0.295 0.456 Memphis 0.269 0.443 Min Max 0 1 0 1 5 24 34 10 96 105 0 0 5 0 1 1 60 12 8.969 0 0 0 0 23.27 0 0 897.472 57 27 1 1 82.929 1 1 36 Table 2: Correlation Matrix 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Online Rating Rating Count Survey Rating Rated Online Surveys Filled Board Gender Experience Peer Rating Population Physician Count Rated Physician Count Urban Large Urban Median Income 8 9 10 11 12 13 14 15 Experience Peer Rating Population Physician Count Rated Physician Count Urban Large Urban Median Income 1 2 3 4 5 6 7 -0.093 0.329 . 0.166 0.04 0.121 0.063 0.083 0.051 0.075 0.074 -0.018 -0.002 0.006 -0.07 0.569 0.009 -0.027 -0.069 -0.041 0.005 -0.003 -0.012 0.044 -0.005 0.013 0.164 0.072 0.437 0.078 0.119 0.049 0.144 0.199 0.227 0.236 0.074 0.135 -0.075 0.061 -0.001 -0.024 -0.028 0.043 0.033 0.067 0.071 0.037 0.004 0.094 0.051 0.129 0.217 0.216 0.05 0.155 0.119 -0.032 -0.053 -0.027 -0.082 -0.269 0.057 0.051 0.091 0.093 0.084 0.05 0.043 0.347 0.093 -0.027 0.011 0.014 0.005 -0.016 -0.054 8 9 10 11 12 13 14 0.059 0.083 0.074 0.044 0.036 0.062 -0.095 0.052 0.08 0.066 0.038 0.049 -0.011 0.532 0.462 0.364 0.419 -0.444 0.853 0.112 0.193 -0.248 0.235 0.276 -0.124 0.745 -0.357 -0.406 37 Table 3: Probability of a Physician Being Rated Online Survey Rating Lower Tercile of Survey Rating Upper Tercile of Survey Rating Lower Quartile of Survey Rating 2nd Lowest Quartile of Survey Rating Upper Quartile of Survey Rating Denver (1) Base Regression 0.0135* (0.00772) 0.283* (0.145) Memphis -0.0454 (0.177) Surveys Filled 0.00604 (0.00423) Urban 0.386** (0.186) Large Urban -0.185 (0.184) Physician Count 0.0178** (0.00890) Rated Physician Count -0.0151 (0.0188) Population 0.000184 (0.000279) Median Income 0.0210*** (0.00673) Experience -0.0105 (0.00677) Gender -0.0110 (0.126) Board -0.223 (0.145) Constant -2.201*** (0.671) LR Chi2 47.31 Pseudo R2 0.0243 N 1,425 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 (2) Tercile Regression (3) Quartile Regression -0.342** (0.136) -0.0234 (0.139) 0.291** (0.146) -0.0509 (0.178) 0.00567 (0.00413) 0.390** (0.186) -0.199 (0.184) 0.0176** (0.00890) -0.0151 (0.0188) 0.000201 (0.000279) 0.0211*** (0.00674) -0.0109 (0.00678) -0.00241 (0.126) -0.220 (0.145) -0.977** (0.462) 60.08 0.0297 1,425 -0.408** (0.160) -0.136 (0.150) 0.00799 (0.165) 0.290** (0.146) -0.0594 (0.178) 0.00520 (0.00413) 0.398** (0.187) -0.213 (0.185) 0.0181** (0.00891) -0.0159 (0.0188) 0.000189 (0.000279) 0.0208*** (0.00674) -0.0106 (0.00677) -0.0144 (0.126) -0.217 (0.146) -0.918* (0.473) 59.66 0.0295 1,425 38 Table 4: Reflection of Survey Ratings on Online Rating Values, Base Specifications (1) OLS Survey Rating Peer Rating Rating Count Denver Memphis Surveys Filled Urban Large Urban Physician Count Rated Physician Count Population Median Income Experience Gender Board Constant N Chi2 R2 0.0469*** (0.00595) 0.0349 (0.0413) -0.0283** (0.0139) -0.0654 (0.109) -0.103 (0.150) -0.000392 (0.00335) -0.0714 (0.134) -0.0414 (0.133) -3.71e-05 (0.00648) 0.00208 (0.0138) 0.000149 (0.000232) .000425 (0.00497) 0.00432 (0.00520) 0.177* (0.0943) 0.0724 (0.108) -0.113 (0.509) 794 (2) (3) Ordered Logit 0.0732*** 0.0744*** (0.0107) (0.00985) 0.128 0.0802 (0.0875) (0.0747) -0.108*** -0.110*** (0.0244) (0.0205) -0.166 -0.144 (0.199) (0.173) -0.254 -0.139 (0.273) (0.243) 0.000542 -0.00116 (0.00609) (0.00533) -0.0213 0.0577 (0.247) (0.216) -0.103 -0.120 (0.244) (0.213) 0.00335 0.00178 (0.0121) (0.0106) -0.00391 -0.000599 (0.0257) (0.0224) 0.000377 0.000188 (0.000422) (0.000378) 0.00946 0.00761 (0.00904) (0.00783) 0.00909 0.00908 (0.00957) (0.00837) 0.300* 0.239 (0.170) (0.148) 0.0880 0.0783 (0.196) (0.172) -1.964** (0.922) 794 794 131.95 Tobit 0.125 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 39 Table 5: Reflection of Survey Rating on Online Rating Values, Allowing for Flexibility Across the Survey Rating Distribution. (1) (2) (3) (4) (5) (6) 0.0397*** (0.0122) 0.0606*** (0.0130) 0.0167 (0.0272) Spline Spline Spline Model Spline Model Model Model (Ordered (Ordered (Tobit) (OLS) (Tobit) Logit) Logit) 0.0488** 0.0506*** (0.0214) (0.0194) 0.0969*** 0.0993*** (0.0235) (0.0210) 0.0663 0.0580 (0.0527) (0.0454) 0.0417 (0.0419) -0.0279** (0.0139) -0.0606 (0.109) -0.0959 (0.150) -0.000354 (0.00335) -0.0634 (0.135) -0.0485 (0.133) 0.000349 (0.00648) 0.00102 (0.0138) 0.000143 (0.000232) 0.00422 (0.00497) 0.00435 (0.00521) 0.178* (0.0946) 0.0784 (0.109) 0.345 (0.910) 794 0.0423*** (0.0109) 0.0717*** (0.0194) 0.0176 (0.0208) 0.0424 (0.0417) -0.0273* (0.0139) -0.0621 (0.109) -0.0922 (0.150) -0.000296 (0.00336) -0.0591 (0.135) -0.0499 (0.133) 0.000544 (0.00649) 0.000506 (0.0138) 0.000141 (0.000232) 0.00427 (0.00497) 0.00433 (0.00520) 0.177* (0.0945) 0.0765 (0.109) 0.159 (0.835) 794 Spline Model Spline Model (OLS) Spline-Lower Quartile of Survey Rating Spline-Middle Half of Survey Rating Spline-Upper Quartile of Survey Rating Spline-Low end of Survey Rating Spline-Middle of Survey Rating Spline-High end of Survey Rating Peer Rating Rating Count Denver Memphis Surveys Filled Urban Large Urban Physician Count Rated Physician Count Population Median Income Experience Gender Board Constant N Chi2 R2 0.127 Standard errors in parentheses *** p<0.01, ** p<0.05, * p<0.1 0.125 (0.0877) -0.109*** (0.0244) -0.146 (0.199) -0.276 (0.274) 0.000362 (0.00609) -0.0196 (0.247) -0.111 (0.244) 0.00384 (0.0122) -0.00568 (0.0257) 0.000383 (0.000422) 0.00901 (0.00904) 0.00876 (0.00957) 0.310* (0.170) 0.0819 (0.196) -0.259 (1.591) 794 0.0793 (0.0749) -0.110*** (0.0205) -0.124 (0.174) -0.160 (0.244) -0.00139 (0.00535) 0.0556 (0.216) -0.122 (0.213) 0.00253 (0.0107) -0.00265 (0.0225) 0.000197 (0.000378) 0.00718 (0.00783) 0.00892 (0.00839) 0.244* (0.148) 0.0748 (0.172) 794 134.06 0.0532*** 0.0557*** (0.0173) (0.0191) 0.117*** 0.117*** (0.0311) (0.0353) 0.0530 0.0537 (0.0343) (0.0398) 0.0805 0.128 (0.0746) (0.0872) -0.108*** -0.109*** (0.0205) (0.0244) -0.127 -0.149 (0.174) (0.199) -0.158 -0.271 (0.244) (0.274) -0.00140 0.000338 (0.00609) (0.00535) 0.0624 -0.0139 (0.216) (0.247) -0.124 -0.112 (0.213) (0.244) 0.00280 0.00405 (0.0107) (0.0121) -0.00340 -0.00623 (0.0225) (0.0257) 0.000188 0.000376 (0.000422) (0.000378) 0.00700 0.00894 (0.00904) (0.00783) 0.00915 0.00894 (0.00956) (0.00839) 0.243 0.308* (0.148) (0.170) 0.0745 0.0823 (0.172) (0.196) -0.560 (1.461) 794 794 134.14 0.128 40