Discretionary Task Ordering: Queue Management in Radiological Services Maria R. Ibanez Robert S. Huckman Jonathan R. Clark Bradley R. Staats Working Paper 16-051 Discretionary Task Ordering: Queue Management in Radiological Services Maria R. Ibanez Jonathan R. Clark Harvard Business School University of Texas at San Antonio Robert S. Huckman Bradley R. Staats Harvard Business School University of North Carolina at Chapel Hill Working Paper 16-051 Copyright © 2015 by Maria R. Ibanez, Jonathan R. Clark, Robert S. Huckman, and Bradley R. Staats Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author. Discretionary Task Ordering: Queue Management in Radiological Services Maria R. Ibanez* Harvard Business School Jonathan R. Clark University of Texas at San Antonio Robert S. Huckman Harvard Business School Bradley R. Staats University of North Carolina at Chapel Hill October 19, 2015 Abstract A long line of research examines how best to schedule work to improve operational performance. This literature traditionally takes the perspective of a central planner who can structure work and then expect individuals to execute tasks in a prescribed order. In many settings, however, workers have discretion to deviate from the assigned order. This paper considers the operational implications of “discretionary task ordering,” defined as the task sequence resulting from an individual’s ability to select which task to complete next from a work queue. Using data from 91 physicians reading a total of more than 2.4 million radiological studies over a period of two and a half years, we examine the conditions under which discretion is exercised and the performance effects of those choices. We find that, on average, deviations lead to slower read times. Doctors tend to deviate more with experience and when they have more variety within their queue. Interestingly, deviations tend to be more effective under those conditions, yet the improvement is not enough to offset the average deviation penalty. To develop our results further, we explore two common ordering heuristics: shortest expected processing time and batching similar cases. We find that choosing the shortest cases first is particularly detrimental for speed. Finally, batching is associated with better performance when it occurs naturally, but not when it results from using discretion. Our research identifies a “discretion fallacy,” offering a behavioral perspective on queue management and highlighting that managers must be careful when allowing discretion within worker queues. Key words: discretion, scheduling, queue, healthcare, learning, experience, decentralization *Corresponding author. Email: mibanez@hbs.edu. Address: Harvard Business School, Wyss House, Soldiers Field Road, Boston, MA 02163. We appreciate helpful comments from Michael Luca, Ananth Raman, Ariel Stern, and Michael Toffel. We thank management at our field site for their commitment to this work. Discretionary Task Ordering 1. Introduction The scheduling of work is a key driver of operational performance in many operational settings, including factories (Berman, Larson and Pinker 1997), trucking (Roberti, Bartolini and Mingozzi 2015), healthcare (KC and Terwiesch 2009), and financial services (Staats and Gino 2012). Accordingly, a rich line of research investigates task-scheduling policies with analytical models taking the perspective of a central planner and identifying optimal schedules that managers should implement (Pinedo and Yen 1997; Pinedo 2012). In many practical settings, however, workers may choose their own approach to sequencing or prioritizing work. Those who execute the tasks often have discretion over the order in which to perform their assigned duties, yet little is known about the drivers of workers’ decisions to exercise such discretion and how scheduling should be managed in circumstances where discretion exists. In this paper, we consider the operational implications of discretion over task ordering, defined as an individual’s ability to select which task to complete next from a work queue. Worker discretion is gaining importance across industries, as technological advances are facilitating the delegation and monitoring of decisions made by front-line workers in manufacturing, services, and knowledge work (Brynjolfsson and McAfee 2014; Pierce, Snow and McAfee 2014). Nevertheless, the fact that discretion can be used, does not necessarily mean that it should be used. Boudreau et al. (2003) note that when considering the use of discretion, individuals may “choose the ‘wrong’ task (operationally)” (p. 186). Therefore, we consider a worker’s decision about which task to execute next among a queue of pending, independent tasks; although the assigned order would suggest choosing the first task in the queue, the worker may choose to exercise discretion by selecting a task from the rest of the queue. Hereafter, “deviation” denotes the exercise of discretion over task ordering via the selection of a task that is not the next one according to the assigned queue. We address two research questions. First, what are the drivers of deviations? Second, what are the subsequent performance implications? To identify the drivers of deviations, we consider the circumstances under which knowledge workers (in our empirical setting, radiologists) are more likely to exert discretion over the order in which they execute tasks. Workers may choose to exercise discretion over task ordering when they believe that the assigned order is suboptimal. The ability of the worker to identify an alternative task sequence that they perceive as superior to the assigned sequence will depend on the characteristics of the individual as well as the set of options available (i.e., characteristics of the queue). With respect to the characteristics of the individual, we examine the role of worker experience in task discretion. With respect to the characteristics of the queue, we examine the contents of the individual’s queue of tasks to be executed. To understand the consequences of deviations, we analyze their effects on performance, measured in our empirical setting as the speed of completing a task while controlling for quality. 2 Discretionary Task Ordering We test these hypotheses in the context of outsourced radiological services, often referred to as teleradiology. In our setting, board certified physicians working for a single company read digital images of different types (e.g., chest X-rays, head CT scans, spine MRIs) from computer monitors. These radiologists are trained to read a variety of study types, such that images are assigned randomly, conditional on a doctor’s qualifications, to doctors’ individual queues. In addition to data on the images and the physicians reading them, the information system provides time stamps marking when each image entered a doctor’s queue and when the doctor submitted the interpretation report. For each image, we can observe whether the radiologist interpreted it following the assigned order or deviated from that approach. Our findings suggest that doctors tend to deviate more often when they are more experienced, when there is more variety in their queue (as measured by the number of unique items), when they can follow a shortest expected processing time (SEPT) policy by deviating (i.e., when there is an opportunity to deviate towards what are expected to be shorter cases), and when there is an opportunity to batch by repeating the prior case type. When doctors deviate, however, their average read time tends to increase. The level of worker experience and the variety of the queue moderate this negative performance impact, yet the deviation penalty still exists, on average. Although SEPT may create the illusion of working faster precisely because it selects shorter cases, it tends to reduce speed and is a particularly detrimental type of deviation. Consistent with theory, prior empirical work, and the beliefs of the radiologists we interviewed, batching is associated with superior performance, yet this is not the case when it results from a deviation. This provides evidence of the potential harmful performance effects of exercising discretion when an individual may underestimate the costs of deviating in relationship to the potential gain – a phenomenon we term the “discretion fallacy”. Our paper makes several contributions to both theory and practice. We are among the first to focus attention on the role of discretion in queue management. Whether in call centers, software companies, or doctors’ offices, technology increasingly permits system designers to allow employees to choose the order in which they complete tasks. This element of system design needs greater theoretical and empirical attention to understand its performance implications, and we provide important evidence related to this goal. Second, we identify the conditions under which individuals are more likely to deviate from the assigned queue. For example, examining the role of experience and the contents of the queue in this process provides insight into the design of work systems. Finally, we evaluate the performance implications of these choices. While prior work evaluates the performance implications of discretion in terms of effectiveness, efficiency is often overlooked. We fill this void by incorporating the cost of discretion in our evaluation of performance. We find that the reordered queue appears to be more efficient (for example, it batches tasks to decrease set-ups), yet processing time (which includes the time cost of executing deviations) tends to be higher for deviations. Our analysis suggests that the time cost of 3 Discretionary Task Ordering reorganizing the queue may make queue improvements inefficient, underscoring the value of having a centralized manager perform queue management in a pooled manner, rather than dividing it across workers. While deviation is unlikely to be problematic in all situations, our findings illustrate that it can be and that managers must carefully evaluate all of the operational implications of allowing discretion. Even such types of deviations as batching, which would be recommended a priori, may have a higher deviation penalty (due to additional set-up time) than the resulting benefit. Our research highlights that workers may engage in the discretion fallacy, suggesting that managers must be careful in providing discretion within the queue. 2. Related Literature Our research is related to four major streams of work in operations: scheduling, gatekeepers, discretionary process times, and managerial decision-making. The scheduling literature investigates the optimal allocation of scarce resources (e.g., a machine or a worker) to tasks over a given set of time periods (Pinedo 2012). The problems considered include project scheduling, transportation scheduling (e.g., Zhu, Crainic and Gendreau 2014), appointment scheduling (e.g., Bassamboo and Randhawa 2015; Truong 2015), and workforce scheduling (e.g., Berman et al. 1997). An influential area of research since the 1950s, the optimization problem can have multiple objectives and typically assumes the use of a central planner. Our work contributes to this literature by both providing empirical evidence of worker discretion with respect to sequencing and showing the performance consequences of these actions. A stream of research that considers worker discretion over operational decisions is the literature on gatekeepers. In this literature, a front-line worker who is capable of solving low-complexity tasks receives a customer’s request and then decides whether to attempt to complete the task or refer the customer to a specialist (Shumsky and Pinker 2003). Similar to our work, this literature recognizes that the structure of work depends on decisions made by front-line workers. For example, in healthcare, the fact that triage nurses in emergency departments can correctly predict whether a patient will be admitted to the hospital a high percentage of the time (e.g., King, Ben‐Tovim and Bassham 2006) provides motivation for patient flow policies that exploit this local knowledge of gatekeepers (Saghafian et al. 2014). We build upon this work, as individuals in our setting do not choose whether to route work elsewhere but instead choose the order in which they complete their own work. Discretion has also been studied in terms of workers exercising choice that leads to differences in task completion times. This can be seen in the context of the speed-quality tradeoff in service and professional settings where quality increases with the duration of the provider-customer interaction. Analytical models capturing this type of discretion show that, under certain conditions, adjusting processing times to system congestion (i.e., workload) implies that increasing capacity might intensify 4 Discretionary Task Ordering congestion (Hopp, Iravani and Yuen 2007) and server competition leads in equilibrium to superior service quality but higher price (Anand, Paç and Veeraraghavan 2011). Empirical studies find evidence that workers’ choices influence task completion outcomes and times (KC and Terwiesch 2009; Powell, Savin and Savva 2012) and identify implications for staffing and the scheduling of workers (Lu, Hechling and Olivares 2014; Tan and Netessine 2014; Berry Jaeker and Tucker 2015). Our research extends this work as we investigate, in a field setting, the individual decisions that people make when using discretion and then map these choices to performance. Finally, researchers have also investigated the drivers and quality of managers’ decisions over operating variables. Studies document that human judgment exhibits decision biases and anomalies in operational decisions ranging from forecasting (Kremer, Moritz and Siemsen 2011) to inventory management (Schweitzer and Cachon 2000). Despite the potential for bias, the management coefficients theory (Bowman 1963) postulates that managers possess information regarding the current environment that can improve decision rules derived from traditional analysis, implying that managers should be allowed to modify decision rules periodically to ensure optimality. The effectiveness of managerial decision-making has been evaluated in the field in a variety of domains, such as inventory management, capacity allocation, and pricing (van Donselaar et al. 2010; Campbell and Frei 2011; Phillips, Şimşek and Ryzin 2015). We extend this line of work by looking at decisions in a different domain (i.e., scheduling) and by incorporating the time cost of making decisions (i.e., evaluating the efficiency, not just the effectiveness, of discretion). 3. Discretionary Task Ordering 3.1 Drivers of Deviations from the Assigned Task Order Many jobs consist of executing a series of sequential, independent, and previously ordered tasks. Examples include doctors seeing patients, delivery drivers completing drop-offs, mechanics fixing cars, or back-office processors completing claims. In such settings, workers have visibility into the queue and often have the ability to deviate from the assigned order of the queue, resulting in discretionary task ordering. Although the assigned order would suggest choosing the first task in the queue, she may choose a task from the rest of the queue. We refer to the selection of a task other than the first as a “deviation.” Workers may choose to exercise discretion over task ordering when they believe that the assigned order is not optimal for performance.1 We posit that the tendency to deviate will depend on attributes of the worker and the queue. We first consider the individual. We focus on a worker’s experience for several reasons. First, with experience comes the ability to identify queue inefficiencies and opportunities to improve upon the 1 Workers may also choose to reorder tasks based on personal incentives. In this paper, we focus on operational drivers, and present an empirical setting without personal incentives in conflict with performance. 5 Discretionary Task Ordering assigned order. Significant attention has been given to the relationship between experience and process improvement finding that additional experience typically leads to learning (Lapré and Nembhard 2010; Argote and Miron-Spektor 2011). One reason why experienced individuals show improvement is that they recognize more opportunities to change their work, including possibly the structure of the work within their queue. It is, therefore, not surprising that workers are frequently granted more discretion as they gain experience. The implicit assumption is that with such experience workers have knowledge that should permit them to use discretion effectively. A second reason to study experience and discretion is that there is a risk that experienced individuals may grow overconfident in their abilities to judge when to deviate successfully and thus do so inappropriately (c.f. Tost, Gino and Larrick 2015). Finally, there is a risk with experience that individuals may grow bored with their job, a problem that can be ameliorated by variety (Staats and Gino 2012). In turn, this could mean that they deviate to avoid monotony. As a result of each of these arguments, we hypothesize that: HYPOTHESIS 1: The probability that an individual deviates from the next case in the queue increases with that individual’s level of experience. We next turn to our second area of exploration concerning the conditions under which individuals deviate – the content of the queue. The tasks in the queue represent the set of possible options the worker could choose to address at any given point. When presented with more options, an individual has a greater likelihood of recognizing a beneficial opportunity to reorder tasks. For a given number of tasks in the queue, a higher number of unique types of tasks in the queue (i.e., queue variety) will increase the ability of the worker to follow specific task-sequencing strategies. Thus, if a worker has additional knowledge of how to sequence tasks in an improved manner, then it is more likely that she will be able to execute that strategy when the queue offers greater variety. Hence, variety of case types in the queue should increase the probability that the queue contains a case type that is more desirable than the first case in the queue, thereby increasing deviations: HYPOTHESIS 2: The probability that an individual deviates from the next case in the queue increases with the variety of that queue. One of the task sequencing strategies that individuals may pursue is a shortest expected processing time (SEPT) policy. SEPT is a scheduling policy in which the job with the shortest expected processing time is completed next. There are operational and motivational reasons to follow this policy. From an operational point of view, this scheduling discipline minimizes the average number of jobs in the system and the average job wait time. In addition, precisely because of the operational benefit, workers might find motivation in improving their own performance on these metrics, even if not evaluated based on those. There is an additional motivational reason for starting with the shortest jobs—prior work finds 6 Discretionary Task Ordering that the act of completing a task leads to feelings of progress and achievement (Gal and McShane 2012). In other words, individuals may exhibit a preference towards completion, even in settings where their self-interest would be better served by completing tasks in a different order. For example, Amar et al. (2011) find that individuals choose to pay back smaller debts with lower interest rates, to accomplish completion, instead of paying back the same amount of a larger debt with a higher interest rate (and thus saving money in the process). In our setting, that means that individuals might take on the shortest tasks first to trigger completion-driven feelings of progress and achievement. Thus, if the worker reorganizes the queue according to SEPT, then she will be more likely to deviate when the remaining queue (i.e., the queue excluding the first item) contains the task type in the queue with the shortest expected processing time. HYPOTHESIS 3: The probability that an individual deviates from the next case in a queue is higher when the first task in the queue is not of the task type with the shortest expected processing time in the queue. A second category of task sequencing strategies involves batching—grouping tasks by their types to increase the repetition of similar activities. People can avoid the cost of switching (e.g., either mental or physical setup costs) by selecting a case that repeats the predecessor’s case type. Thus, one specific reason to deviate may be the batching of tasks. However, batching is not always possible; the opportunity to batch depends on the availability of at least one task of the same type as the predecessor. In some cases, the first task in the queue is of the same type as the predecessor; in such situations, respecting the assigned order would already bring the benefits of batching. When the first task in the queue is not of the same type as the predecessor, one could deviate towards batching if the rest of the queue contains a repetition of task type. If individuals have an intention to batch, then their likelihood of deviating should be higher when there is an opportunity to batch by doing so (i.e., when they cannot batch by following the first case in the queue but can batch by deviating). HYPOTHESIS 4: The probability that an individual deviates from the next case in a queue is higher when the first item in the queue is not of the same type as its predecessor, but the remainder of the queue offers an opportunity to repeat the predecessor case type (batch). 3.3 Performance Implications At least since the origin of the scientific management movement, operations as a field has sought to understand the drivers of effective performance (Taylor 1911). Over time, improvements have been identified in many areas – from queueing system design (Gans, Koole and Mandelbaum 2003) to task 7 Discretionary Task Ordering scheduling (Pinedo and Yen 1997) to product variety (Fisher and Ittner 1999). Recent work has increasingly considered human aspects of operations management problems (Boudreau et al. 2003), incorporating such behavioral factors as workload (KC and Terwiesch 2009; Powell et al. 2012; Tan and Netessine 2014; Berry Jaeker and Tucker 2015), team composition (Huckman, Staats and Upton 2009; Schultz, Schoenherr and Nembhard 2010), and learning from experience (Huckman and Pisano 2006; Narayanan, Balasubramanian and Swaminathan 2009). In this paper, we wish to understand how discretion affects the time required to complete a task. We consider the case where a worker must eventually complete all tasks within her dynamic queue but can choose how to order them. Completion time for person i completing task j can be expressed simply as: Completion_timeij = Setup_timeij + Processing_timeij (1) Under discretionary task ordering, where individuals select which task to execute next, setup time includes both the search time incurred investigating the queue and choosing a task (task selection time) and the time invested preparing to execute the new task (standard setup or changeover). Processing time represents the “run time” required to complete the task itself. We begin by exploring how the exercise of discretion might improve completion time. Research increasingly recognizes that front-line personnel often have information that is not available to a central that helps them recognize improvement opportunities that a central planner cannot observe (MacDuffie 1997; Tucker 2007; Staats, Brunner and Upton 2011). For example, delivery drivers may adjust their daily route after observing a road under construction during the prior day. Thus, even if we assume that the queue has been optimally organized by the central planner with the knowledge that she has, it is possible that the worker may recognize opportunities to improve upon that initial organization. Moreover, many queues are not optimally organized to begin with, creating more improvement opportunities. Hence, workers may deviate to apply generally accepted best practices for task scheduling. For example, after completing a task, an individual could recognize that a task further in the queue is very similar to the justcompleted task and so choose to complete it next. Such batching could bring benefits in terms of decreased setup time, even if such setup time is just cognitive. Further, it could also provide processing time benefits, as the relevant knowledge is still in the individual’s working memory, allowing her to complete the work quickly and avoid interruptions (Bendoly, Swink and Simpson 2014; Froehle and White 2014; Wang et al. 2015). In addition to the benefits from improved task sequencing, it is possible that increased discretion could result in improved performance due to higher worker engagement and motivation (Hackman and Oldham 1976; Grant et al. 2010). It is not clear that this would be seen only when discretion is exercised, although it is certainly possible that the exercise of discretion would make the benefit more salient. Given these different factors, we hypothesize: 8 Discretionary Task Ordering HYPOTHESIS 5A: Task deviation leads to faster completion time (i.e., better performance), on average. Although exercising discretion could be beneficial, there are several reasons why it might instead prove distracting. First, by rearranging tasks in the queue, a worker is adding a search cost to the setup time. Second, studies in psychology note that individuals make decisions either intuitively or deliberately and that these processes use different systems within the brain (Stanovich and West 2000). By exercising discretion, an individual may be likely to move herself from her quicker, intuitive approach into the slower, more deliberative approach. Although there may be times that this is an appropriate switch, ideally value-added work should drive that change (e.g., a doctor’s case is particularly difficult and so requires additional attention), rather than having the individual’s reorganization of the queue driving the effect. Research on cognitive distractions also suggests that by switching across tasks, a worker may slow performance through longer times for both setup and processing (KC and Staats 2012; Staats and Gino 2012; Froehle and White 2014). Finally, we note that exploring these effects empirically is important since an individual could overestimate the gains noted above, or underestimate the losses noted here, developing the misperception that a particular type of detrimental deviation is favorable. This misperception could cause the worker to repeat the same mistakes over time, thus lowering performance. Given the potential for conflicting effects on performance, we offer the following competing hypothesis: HYPOTHESIS 5B: Task deviation leads to slower completion time (i.e., worse performance), on average. The exercise of discretion may have stronger effects under some conditions than under others. To understand when these performance implications are more pronounced, we go beyond testing for the average effect of deviations on performance to consider the drivers of deviations proposed in Hypotheses 1 through 4 as moderators. To the extent that these moderators are expected to have positive direct effects on performance, we would anticipate that they would improve the impact of deviation on performance. For example, to the extent that repetition of task type is associated with superior performance, one would expect deviations to be more effective when they result in batching than otherwise. 4. Setting, Data and Models 4.1 Empirical Setting – Outsourced Teleradiology Services We test our hypotheses using transaction-level data from one of the largest teleradiology firms in the United States. In this setting, radiologists sequentially interpret “cases”, which correspond to a set of images of a particular anatomical area (e.g., abdomen, head) and technology for a particular patient. Technologies used in our setting include X-rays (electromagnetic waves, 3.68% of our final sample, which is described below), computed tomography (CT, 84.26%), nuclear medicine (1.01%), magnetic 9 Discretionary Task Ordering resonance imaging (MRI, 0.85%), and ultrasound (10.20%). The anatomical areas include abdomen (5.58%), body (combination of images, 36.31%), brain (33.23%), breast (0.01%), cardio (0.2%), chest (12.78%), gastrointestinal/genitourinary (1.24%), head and neck (2.44%), musculo (1.31%), obstetrics (2.6%), pelvis (0.53%), spine (3.73%), and other (0.04%). The company receives cases from clients— typically hospitals or physician group practices—and assigns the reading of them to individual radiologists. Both clients and radiologists log into the company’s proprietary online system to submit, read, send, or receive cases. Cases of different types arrive from client hospitals and are assigned to eligible radiologists as they arrive. To be eligible to receive a case, a radiologist must have been trained in the anatomical area and technology. Moreover, the radiologist must be licensed by the state and credentialed by the hospital where the radiological image was created. Given these requirements, most radiologists in the company can interpret the majority of cases, and are licensed in over 35 states and at one-third of the client hospitals. Finally, an eligible radiologist must be on duty during the period when the study arrives. Conditional on a radiologist meeting the availability and eligibility requirements, case assignment to radiologists is random. Once a case is assigned to a particular radiologist, it is not reallocated to another radiologist. Radiologists work independently without supervision, only see cases assigned to them, and have discretion over the order in which they read cases in their queue. At any point in time, the radiologists can see their own queue of pending cases. We observe—and can thus control for—the factors that the radiologist observes when deciding which case to select next. In particular, for each case in the queue, the radiologist sees: (1) the time when the case was assigned to her queue, (2) the technology employed to create the images, (3) the anatomical location of the study, and (4) the number of images. The average size of a radiologist’s queue is 5.6 cases, and new cases keep joining this dynamic queue while the radiologist is on duty. Hence, radiologists make a decision, explicitly or implicitly, regarding which case to read next every time they start a case rather than reordering all of their cases at the beginning of a shift, as could happen in settings where all tasks to be completed are known initially. The radiologists in our setting aim to maximize their overall speed subject to delivering the correct clinical interpretation. They seek to maximize speed for both business and clinical reasons. On the business side, teleradiology companies compete for business on the promise of fast service. On the clinical side, unlike some service settings, completion speed is a major determinant of quality, as timely access to the reading report is critical for the patient’s referring doctor to deliver the proper treatment. This positive relationship between speed and quality has been noted in prior work on healthcare (Pisano, Bohmer and Edmondson 2001). While radiologists seek to maximize speed, they do so subject to the constraint of maintaining acceptable quality. The teleradiology company tracks reading discrepancies, whereby a customer receiving a radiological report may raise an objection, including minor comments. 10 Discretionary Task Ordering The data shows that such discrepancies are rare, affecting only 0.2% of the images interpreted during our sample period, so it is reasonable to assume that the clinical quality of the reads is deemed acceptable. Numerous features of this research site make it an ideal setting to explore our questions and mitigate concerns about gaming behavior or other reasons unrelated to performance that might cause radiologists to prioritize certain cases. First, in this teleradiology company, there is no preemption. Both the response and reading times are quick, so radiologists do not interrupt the reading of a case once it is in progress. Second, all cases are deemed urgent, so there is no prioritization based on medical emergency. Third, given the time sensitivity of the service, cases are not left in a doctor’s queue before any break above thirty minutes or by the end of their shift; therefore, postponing a job does not affect the case-mix or the workload, and there is no need to prioritize shorter cases by the end of the shift due to time constraints. Fourth, the type (technology-anatomy) corresponding to arriving cases is independent of doctors’ speed and the types of cases they have previously received or completed, addressing any remaining concern about prioritizing cases by type to affect case mix. Fifth, there are no mandatory order restrictions (such as predecessor tasks that would need to be performed before the current one) or external factors (such as collaboration with other individuals, Halsted and Froehle 2008; Wang et al. 2015) that could limit the doctor’s discretion. Finally, there is no need to reorder cases to put the images for a single patient together, as that grouping is taken care of when the case is initially assembled. Therefore, in this setting, individuals seeking to improve performance are given a task schedule based on the random arrival times of cases and are allowed to adjust it. Each case represents a welldefined task, and the radiologists have the freedom to decide which case to work on next. Because the number of cases in a radiologist’s queue is 5.6 on average, they can reasonably inspect the queue to evaluate alternative sequencing strategies. Management does not systematically instruct the radiologists to follow any particular scheduling policy. While the company did not provide access to the radiologists whose work is captured in our data, we interviewed several radiologists at other institutions with similar processes to understand how they approached task ordering and deviation. We found that the concept of a dedicated, individual queue and freedom to order tasks in that queue were common. These radiologists also indicated that they often would choose to deviate in the ordering of tasks. They provided different explanations, including a desire to read faster cases first and an intention to repeat the same technologyanatomy combination (batching), in line with Hypotheses 3 and 4. In particular, the radiologists indicated that they thought that batching was helpful in the interpretation of images, and highlighted the importance of preparing their minds to look at a specific anatomical area at a given point in time. In our final sample (described below), half of the deviations are consistent with the two reasons to deviate discussed above. First, 48% of deviations are towards a case type with the shortest average 11 Discretionary Task Ordering reading time, consistent with a shortest expected processing time (SEPT) scheduling policy.2 Second, 15% of deviations are towards a batching (repetition) opportunity.3 Note that deviating towards batching is not always possible, as it is conditional on the case types available in the remainder of the queue. When batching is possible in our sample, the percentage of deviations that are consistent with batching is 46%. These deviations increase repetitions from 9% (assigned) to 13% (actual). 4.2 Data Our data covers all 2,766,209 cases processed by the teleradiology company between July 2005 and December 2007. We observe both the order in which jobs are assigned to radiologists and the order in which they are completed, with differences in the two being due to a radiologist’s exercise of discretion. For each case, we observe its clinical/technical characteristics (e.g., technology, anatomy, and number of images), the identity of the radiologist who interpreted it, the time it was assigned to the radiologist, and the time the radiologist’s work was completed. From this information, we compute the set of cases in a radiologist’s queue at any point in time and identify whether each case represents the first in the radiologist’s queue when it is interpreted. We impose three restrictions on the initial sample. First, because we seek to study decisions made by radiologists about whether to deviate from the queue, we limit the sample to those cases that were selected from a queue of at least two cases. This restriction eliminates those cases that were the only ones in the queue when the radiologist started working on them, as there would be no potential for a radiologist to deviate from the assigned order in such instances. Second, we drop cases for which the time elapsed since the last case exceeds thirty minutes (the 99.5th percentile), as we assume that it represents a new shift or break. We do not have records of the exact breaks taken by the radiologists, who lack fixed schedules and rules for breaks. Through this restriction, we define a shift break as a period longer than thirty minutes without completing a case, and drop the first observation for each shift because the estimated reading time would include initial set-ups. No case is left in the queue at the end of these shifts, which is consistent with the company practice of zero backlogs between shifts to ensure quality of care.4 Third, we drop the observations corresponding to four radiologists, each of whom had less than 100 cases in the remaining sample. This leaves us with a minimum of 239 cases per radiologist, which facilitates convergence of the model estimation and ensures that the fixed effects do not introduce bias into unconditional probit estimates, as we describe in the Econometric Models section. The final sample for our regression analysis includes 2,408,218 cases interpreted by 91 radiologists. Table 1 contains the list of 2 Expected processing time (EPT) is calculated as the average reading time for the given case type for the focal radiologist. A case is a repetition if it is of the same type (technology and anatomy) as the previous case read by the radiologist. We do not consider a repetition of the anatomy categories “body” or “other” as a repetition, since each case in these categories is unique. We compute whether there is a repetition using the full sample, before imposing the restrictions described in the next section. 4 Alternative cutoffs for shift breaks (20, 40, 90, 480 minutes) provide similar results (see Extensions and Robustness Checks). 3 12 Discretionary Task Ordering variables, and Table 2 displays summary statistics and correlations among variables in the final sample, where the unit of analysis is the individual case being interpreted by a radiologist. 4.2.1. Dependent Variables Deviation from the assigned queue order. DEVIATION is a binary variable indicating whether the individual deviated from the assigned order when selecting the current case. This variable takes the value of one if the job selected was not the first one in the queue and the value of zero otherwise. In our sample, doctors deviate from the order in which the cases were assigned to them 42% of the time. Completion (Read) Time. We capture performance using the amount of time the radiologist spends reading a particular case. We estimate the reading time (READTIME) for a particular case as the difference between the time when the current case and the prior case of this radiologist are completed. When a case is not available in the queue when the radiologist submits the previous case, we use flow time, measured as the time difference between when the case becomes available (i.e., when the case is assigned to the radiologist) and when the case is completed (i.e., when the radiologist submits the interpretation report). This calculation could overestimate reading time for the first case in a shift; because of this, as described above, we ignore the first case of each shift for each radiologist. If cases were left in the queue between shifts, our method for calculating reading time could overestimate reading times; however, in our setting, no case is available in the queue when the last case of a shift is completed. The time stamp is at the minute level; that is, for each case, we know the minute in which it is assigned to the radiologist and the minute in which the radiologist completes the reading. When a case exits the system within the same minute it enters or the prior case is submitted, the time difference will have a zero value. As described below, our empirical models use the natural log of read time as the dependent variable. Because the logarithm of zero is not defined, a convention when dealing with logarithmic transformations is to add an arbitrary fixed constant before taking its logarithm. Following this convention, we add a constant equal to one to all values, which in practice is equivalent to rounding up the estimated reading time. The average estimated reading time per case is 3.75 minutes. 4.3 Econometric Models We examine workers’ decisions to deviate from the assigned queue order and the subsequent performance implications by estimating an endogenous treatment-regression model (Heckman 1978; Maddala 1983) composed of two equations—a probit model for the treatment (i.e., the decision to deviate from the first case in the queue) and a log-linear regression model for performance: 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁!" = 1 𝑖𝑓 𝑿𝒊𝒋 𝜷 + 𝜀!" > 0 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 ln 𝑅𝐸𝐴𝐷𝑇𝐼𝑀𝐸!" = 𝛿𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁!" + 𝑿!𝒊𝒋 𝜷! + 𝑢!" , 13 (2) (3) Discretionary Task Ordering Here i and j denote radiologists and cases, respectively; 𝑿𝒊𝒋 and 𝑿!𝒊𝒋 are vectors of covariates; and 𝜀!" and 𝑢!" are random disturbance terms. The error terms 𝜀!"# and 𝑢!" are bivariate normal with mean zero and covariance matrix 𝜎! 𝜌𝜎 𝜌𝜎 . 1 The covariates 𝑿𝒊𝒋 and 𝑿!𝒊𝒋 include fixed effects for radiologist, case type (technology and anatomy), and calendar year; the number of years the radiologist has been working at the company (EXPERIENCE); and the case-type variety in the queue, measured as the number of different case types (defined by the unique combinations of technology and anatomy of the case) in the queue (QVARIETY). The radiologist fixed effects control for time-invariant radiologist characteristics, while the case type fixed effects control for heterogeneity in the average reading time across case types. According to Hypotheses 1 and 2, we expect 𝛽!"#!$%!&'! and 𝛽!"#$%&'( to be positive in Equation (2). The covariates 𝑿𝒊𝒋 and 𝑿!𝒊𝒋 also include controls for (a) the number of cases read by the radiologist since the beginning of the current shift, including the current case (ORDER_IN_SHIFT), because every additional case provides warm-up as well as fatigue, so the behavior of the radiologist is expected to evolve over the course of the shift, (b) the number of images in the current case (NUM_IMAGES), since a larger number of images will involve additional reading time and should therefore affect the reading speed and possibly the likelihood to deviate, and (c) the number of cases in the radiologist’s queue from which the current case was selected (QSIZE), which represents the number of options to choose from as well as the workload, and is therefore expected to affect both the decision to deviate and the resulting performance. In addition to these common covariates, 𝑿𝒊𝒋 in the deviation model includes an indicator for whether the first case in the queue does not have the shortest expected processing time within the queue (SEPT_OPPTY), an indicator for whether there is an opportunity to repeat case type by selecting a case from the queue other than the first case (REPEAT_OPPTY), and fixed effects for the case type of the first case in the queue. According to Hypotheses 3 and 4, we expect 𝛽!"#$_!""#$ and 𝛽!"#"$%_!""#$ to be positive. Finally, 𝑿!𝒊𝒋 in the performance model also includes an indicator for whether the queue was empty when the previous case was finished (RESTART), to account for the restarting (warm-up) effects after being idle for a short period within the shift. This is infrequent in our sample, representing only 1% of the cases, and is not included in the deviation model because it perfectly predicts the outcome. We note four important points regarding the empirical specification. First, the simultaneousequation system takes into account the fact that the deviation decision is determined within the model rather than predetermined. Given that DEVIATION is endogenous, the method of least squares should not 14 Discretionary Task Ordering be applied to estimate the performance equation because those estimators would not only be biased but also inconsistent. The maximum likelihood estimator of the simultaneous-equation system presented is consistent (Heckman 1978; Maddala 1983). Second, since we do not have a measure of radiologists’ experience prior to joining the company, Equation (3) uses the exponential learning curve model, which prevents bias from our lack of information about prior experience (Lapré and Tsikriktsis 2006). Third, we note that although the maximum likelihood estimator in the presence of fixed effects in discrete choice models shows a finite sample bias when the number (T) of observations per individual is very small (i.e., incidental parameters problem), the bias is negligible for large T, as it declines rapidly as T increases beyond 3 (Greene 2004). Since we have a deep panel, with 26,464 cases per radiologist, on average, and at least 239 cases per radiologist, the only problem estimating unconditional fixed effects in our setting is computational. Finally, the dependent variable in the performance model is completion time, which would suggest that a proportional hazard model or other class of survival model could be used (Lu et al. 2014). Survival models are used to study time duration until a certain event happens (time to completion in our case), where a missing data problem occurs because the starting point and/or the time the event happens is outside the sample period. We do not have any censoring or truncation problem, as we observe assignment and completion time for all cases interpreted by the company during the sample period. Hence, a log-linear model is appropriate. 5. Results 5.1. The Determinants of Deviations To investigate the drivers of deviations from the queue, probit maximum likelihood estimates of Equation (2) are shown in Panel A of Table 3. Because our dependent variable is binary, we use probit regression but note that a linear probability model yields the same inferences. Average marginal effects (AME) are provided next to the corresponding coefficients. All specifications include fixed effects for radiologist, year, type of the case currently being interpreted, and type of the case that was first in the queue when the current case was selected. Standard errors are clustered at the radiologist level. In the baseline model (Columns 1a and 1b), we only include the controls. The estimated coefficient on the length of the queue (QSIZE) is positive and significant; it may be that larger queues offer more opportunities to deviate or that they create workload pressures that lead a radiologist to choose to deviate. It is possible that the use of discretion could be part of what leads to the eventual “speed up” effect that prior literature has observed with larger queues (KC and Terwiesch 2009; Staats and Gino 2012). The magnitude of the average marginal effect implies that one more case pending in the queue increases the probability of deviating by 2.19 percentage points (a 5.23% increase when compared to the sample average of 41.9%). In addition, the coefficient estimate for the position within the radiologist’s shift 15 Discretionary Task Ordering (ORDER_IN_SHIFT) is statistically significant, albeit economically small, while the coefficient estimate for the number of images (NUM_IMAGES) is not statistically significant. To test the hypothesis that worker characteristics affect adherence to the assigned work order, radiologist experience (EXPERIENCE) is included in Column 2. The results show that higher tenure at the company is associated with a higher likelihood of deviation, which supports Hypothesis 1. An additional year of experience increases the probability of deviating by 7.98 percentage points (a 19.05% increase when compared to the sample average of 41.9%). Note that we measure experience by the number of years that the radiologist has worked at the company, as it allows us to capture experience prior to our sample period and it is a common measure in the literature (Tucker, Nembhard and Edmondson 2007). Using the volume of cases interpreted by the radiologist as an alternative measure of experience provides similar results, as shown in Column 1a of Table 5. Turning to queue variety, Column 3 adds the number of different case types available within the radiologist’s queue (QVARIETY). The different task types within the queue represent alternative options for the individual. As predicted by Hypothesis 2, greater queue variety is positively and significantly associated with deviations. With each additional case type added to the queue of a particular length, the probability of deviating from the queue, all else equal, increases by about 2.57 percentage points (a 6.13% increase when compared to the sample average of 41.9%). To unpack the value of variety within the queue, we next look at two particular options of deviation strategies—deviations towards the shortest cases and deviations towards case-type repetition (batching). To explore a Shortest Expected Processing Time (SEPT) policy as a reason to deviate from the assigned order, Column 4 includes an indicator for whether the first case in the queue is inconsistent with such a policy and there is, therefore, an opportunity to follow it by deviating (SEPT_OPPTY). We find that the predicted probability of deviating from the assigned order is higher when the first case in the queue is not the shortest case in the queue, supporting Hypothesis 3. The average marginal effect suggests that having an opportunity to follow a SEPT policy by deviating boosts the probability of deviation by 3.85 percentage points, increasing the average predicted probability of deviation from 39.95% to 43.80%. According to Hypothesis 4, individuals are more likely to deviate when the first case in the queue is not of the same type as the predecessor but the remainder of the queue offers an opportunity to repeat (REPEAT_OPPTY). Column 5 offers support for this hypothesis. The average marginal effect indicates that having an opportunity to batch by deviating from the queue increases the probability of deviation by 1.51 percentage points, increasing the average predicted probability from 42.11% to 43.62%. Note that Hypotheses 1 to 4 continue to be supported after the simultaneous introduction of the covariates discussed, as shown in Column 5. 16 Discretionary Task Ordering 5.2. The Impact of Deviations on Performance To study the impact of deviations on performance, we estimate the system of simultaneousequations (2) and (3) using the maximum likelihood method described above. Panel B in Table 3 presents the performance regressions where the deviation behavior is modeled according to the probit specification in the corresponding column in Panel A. All specifications include radiologist, year, and case-type fixed effects, and standard errors are clustered by radiologist. According to Column 1, for each additional case in the queue (QSIZE), the average reading time decreases, all else equal, by about 3.3% on average. Reading time increases for each additional image (NUM_IMAGES) included in the case by about 18.8% and decreases over the course of a given shift (ORDER_IN_SHIFT), though by a small amount on a caseby-case level. On average, reading time per case more than doubles after a temporary period of zero workload due to restarting effects (RESTART). Turning to our main independent variable (DEVIATION), we find that, on average, the exercise of discretion is associated with slower (i.e., worse) reading times. Compared to cases that are first in queue, cases that are deviations take 13% longer on average. This supports Hypothesis 5B rather than Hypothesis 5A in our setting. These results provide evidence of the cost of exercising discretion and call for managerial and academic attention. Though these empirical results suggest that the reorganization of work through deviations from the queue tends to worsen performance, deviations from the queue remain frequent. Why would individuals take actions possibly against their own interests? To understand why individuals would deviate if it hurts performance, and how the attributes of individual workers and the queue affect the impact of deviations on performance, the following models include measures corresponding to the four factors identified as drivers of deviations (Hypotheses 1 to 4). Worker Experience We first test for the heterogeneous effect of deviations across different levels of worker experience. Although the learning, overconfidence and avoiding monotony effects of experience point to a greater number of deviations, it is less clear how this will affect performance. With respect to learning, there may be learning about how to deviate more effectively over time. If experience leads to improved knowledge about when to deviate, then in turn this could lead to better performance. Alternatively, it is possible that overconfidence could lead to slower completion times when an individual deviates. In particular, if an individual deviates toward tasks in a manner that unknowingly create additional setup or processing time, performance may decline. The same logic could hold true for avoiding monotony-inspired deviations, as these are not based on underlying knowledge differences. However, if more experienced individuals are more likely to be more bored by their work, as it gets more repetitive, then it is at least possible that a deviation could generate positive motivation that might overcome this decline in performance. Disentangling these effects is ultimately an empirical question. 17 Discretionary Task Ordering Column 2 in Panel B (Table 3) explores how experience affects the performance impact of deviations. We find evidence of learning-by-doing; on average, an additional year of experience (EXPERIENCE) decreases reading time per case by 5.4%, holding all else constant. Furthermore, tenure at the company ameliorates the negative effect of deviation on performance, consistent with learning effects and suggesting that senior radiologists choose better types of deviations. Because of the interaction between the treatment (DEVIATION) and moderating covariate (EXPERIENCE), the estimates do not directly represent the average treatment effects [i.e., the average difference of the potential outcomes of treatment (deviations) and control (no deviations)]. The average treatment effects (predictive margins) derived from this model are shown in Table 4.A. At each integer level of years of experience, deviations tend to have a higher predicted mean reading time than cases in which the radiologist follows the assigned queue order, suggesting that learning about how to deviate does not overcome the cost of deviating within our sample. The predicted mean reading time for deviations after three years at the company is equivalent to the prediction for no-deviations for newcomers (chi2(1)=1.29, p-value = 0.2555), suggesting that the deviation penalty is large enough to suppress the learning effect from three years of experience. To understand the overall performance impact of deviations, we need to consider both the individual impact of each deviation and the frequency with which they occur. To do this, we consider how productivity changes for a radiologist over the course of a year. We combine the results from Panels A and B in Table 3 and Table 4.A. We find that a one-year increase in experience from the average (two years) increases the frequency of deviation by 8 percentage points, from 41.9% to 49.9%. At the same time, it reduces the penalty associated with each deviation, from a 12.5% increase in reading time (compared to cases that are first in queue) to a 11.5% increase. Combined, these results suggest that a one-year increase of experience from the average results in a 5% decrease in reading times, on average. Thus, experience still leads to better performance, but the improvement is lower than what would have happened if the deviation tendency had not increased. Given that, on average, a radiologist interprets approximately 11,000 cases per year, the time saved during a year by the additional year of experience corresponds to 35 hours. If the deviation tendency had not increased, the one-year additional experience would have delivered a productivity boost of 40 hours, that is, five extra hours. Queue Variety The effectiveness of deviations may depend on the variety of case types within the queue. As argued in the discussion of Hypothesis 2, greater queue variety offers more opportunities for the worker to implement alternative types of improvements in the workflow. As the number of options increases, the ability to execute a superior deviation strategy also goes up (at least weakly), hence improving the effectiveness of deviation via lower setup or processing time. On the other hand, as variety increases, both the search cost incurred investigating the queue and the risk of choice overload (Monsell 2003) increase, 18 Discretionary Task Ordering thus possibly leading to slower decisions regarding which task to select (i.e., higher task ordering time) and, therefore, increased completion times. To test the performance implications of deviations from the queue depending on the content of the queue, the results in Column 3 (Panel B, Table 3) add queue variety and the interaction effect between deviation and queue variety. For a one-unit increase in the number of unique case types within the queue (QVARIETY), there is a 2.5% decrease in completion time. We find that the interaction term is negative and significant, consistent with faster completion times for deviations selected under higher queue variety. Table 4.B shows the average treatment effects of deviations versus non-deviations, estimated from the model in Column 3, for each value of queue variety. Deviations correspond to higher average reading times for each level of variety. In terms of predicted mean reading time, not deviating when there is only one case type within the queue is roughly equivalent to deviating in the face of five unique case types (chi2(1)=0.14, p-value = 0.7096), suggesting that the addition of approximately four case types to the queue is required to offset the deviation penalty. To further understand the mechanism through which the contents of the queue affect the effectiveness of the exercise of discretion, we next look at the impact of the two specific deviation strategies that depend on the contents of the queue—(1) to take advantage of an opportunity to choose the shortest case within the queue and (2) to repeat a given case type.5 Shortest Expected Processing Time (SEPT) Theory does not predict a specific direction of the effect of a SEPT policy on speed. On one hand, SEPT can increase motivation, yielding a positive impact on productivity. On the other hand, following SEPT could increase set-up costs or create choice overload that consumes an individual’s working memory. Therefore, following a SEPT policy may impact efficiency either positively or negatively. In terms of deviations, there are reasons to believe that deviations towards SEPT may be particularly timeconsuming, as the search cost of exploring the queue not only includes looking at the case-mix, but also mental estimation of the reading time of each case type and comparison across cases. For example, while deviations towards batching only require an individual to search through the queue until a case of a particular type is identified, SEPT involves going through the queue computing the expected processing time of each case type and contrasting it to the shortest one identified so far. In Column 4 of Panel B (Table 3), we include the indicator variable SEPT, which equals one if the case read corresponds to the case type with the shortest expected processing time in the queue. We calculate SEPT by examining the average reading time for this case type for this radiologist (note that this 5 Note that the two strategies facilitated by the contents of the queue that we analyze are differently impacted by queue variety. SEPT is always possible (and so queue variety does not affect the ability of the worker to execute this strategy), but the case type corresponding to SEPT changes. As queue variety increases, the EPT of cases corresponding to SEPT goes down, because the shortest case within the queue tend to become shorter as more case types become available, in general. On the other side, REPEAT is not always possible. As queue variety increases, the probability of REPEAT being possible goes up. 19 Discretionary Task Ordering is the same approach used to calculate SEPT_OPPTY in the deviation analysis). According to the corresponding average treatment effects (shown in Table 4.C), following SEPT increases the predicted mean reading time by 5% and 7% for non-deviations and deviations, respectively. The deviation penalty corresponds to 11% and 10% of the reading time when following SEPT and otherwise, based on the second and first rows of the table, respectively. Therefore, our results suggest that following a SEPT policy tends to hurt performance in general, and tends to increase the performance penalty of deviations. Repetition of Case Type As discussed above, one common reason cited by radiologists for deviation was the desire to batch cases of the same type. Batching is commonly taught in operations management, specifically in contexts where repetition of an activity can help avoid additional setup costs due to the physical or cognitive set ups associated with switching between tasks. Research documents that repetition is associated with improved performance (e.g., Staats and Gino 2012); this is the argument behind why an individual would choose to deviate towards batching (Hypothesis 4). To analyze the impact of batching on performance in our setting disregarding the deviation choice (and hence, without a deviation model), we run an ordinary least squares model of completion time. The results in Columns 2 and 3 of Table 5 include the indicator variable REPEAT, which equals one if the prior case was of the same type (technology and anatomy) as the current one, and zero otherwise. We find that, on average, holding everything else constant, reading times per case tend to be 1.7% lower for those cases that are repetitions of the prior case type than for those cases that are not repetitions. This confirms that the general finding of batching being associated with improved performance holds in this setting. Given these results, as well as the aforementioned received wisdom of operations management, knowledgeable individuals might conclude that they should use their discretion to increase the number of repetitions (and reduce the number of case-type “set ups”) by reordering tasks, justifying why individuals might have a tendency towards batching. The question thus becomes whether these deviations towards batching are desirable. To answer this question, we return to our simultaneous-equations system (Column 5 in Table 3 Panel B). Table 4.D shows the predicted mean reading times depending on whether the case is a REPEAT of the prior case type and whether the case represents a DEVIATION from the assigned queue order. Again, the results corroborate that batching is generally associated with superior performance. First, the predicted mean reading time per case is 1% lower when there is a natural repetition of case type (bottom left cell) than when there is neither repetition nor deviation (top left cell, chi2(1) = 7.11, p-value = 0.0077). Second, conditional on deviating, the predicted mean read time is 2% lower when there is a repetition (bottom right cell) than otherwise (top right cell, chi2(1) = 15.22, p-value = 0.0001). However, batching tends to hurt performance when it is the result of queue reordering; the predicted mean reading time is 8% higher when the radiologist deviates from the queue by taking a case that creates a repetition 20 Discretionary Task Ordering (bottom right cell) compared to cases in which a radiologist neither deviates nor batches (top left cell, chi2(1) = 21.18, p-value = 0.0000). Finally, as indicated by the first row of the table, deviating from the queue by taking a case that is not a repetition is associated with a 10% increase in predicted mean reading time, which represents an even larger deviation penalty (chi2(1) = 39.60, p-value = 0.0000). Overall, these results suggest that deviations towards batching are less detrimental than other deviations but still have a negative net effect on performance compared to adhering to the original order. Hence, contrary to expectations that commonly overlook the costs of exercising discretion, we find that radiologists should forgo deviations that are aimed to take advantage of batching, as the benefit of repetition does not compensate for the cost associated with reordering the queue in this setting. 5.4. Extensions and Robustness Checks To examine the robustness of our findings, we report results with alternative criteria to define the sample and methods. First, Column 4 in Table 5 shows the estimation of the full specification of the deviation model (Column 5, Panel A, Table 3) using a linear probability model instead of probit. Second, we report the results including fixed effects for the client hospital in Column 5 of Table 5. In our setting, there is no incentive to prioritize cases based on clients, as the client base is extensive and the company does not systematically offer preferential treatment to any of them. Consistent with this, and alleviating concerns regarding prioritization of cases based on preferential treatment of certain hospitals, the results are robust to the inclusion of hospital fixed effects. Finally, we use alternative shift breaks windows. Columns 1 to 4 in Table 6 report the main model (presented in the last column of Table 3) for shift-break cutoffs of 20, 40, 90, and 480 minutes, respectively. The 480-minute (8-hour) cutoff is used to represent a daily shift. These break windows alter the values of ORDER_IN_SHIFT, which we recalculate for the shift identified by the corresponding breaks, and the size of the sample, as we drop the first case of each shift as indicated in the sample construction description above. Our results are robust to each of these alternative measures of shift and subsequent sample restriction. 6. Discussion and Conclusion Granting workers discretion over task ordering challenges the universal assumption of job scheduling as a managerial decision. Though prior literature studies task-scheduling policies from the perspective of a central planner, those who execute the tasks often have discretion over the actual order in which they are performed. In practice, either by constraint or by choice, delegation of task-scheduling decisions is common and results in individuals self-scheduling their work. As a result of the lack of research on discretionary task ordering, little is known about how managers should manage scheduling under circumstances in which such discretion exists. Understanding when and how individuals exercise this discretion will inform system design, the decisions of whether to grant discretion, how to nudge 21 Discretionary Task Ordering behavior towards more appropriate uses of discretion, and how to adjust central policies to incorporate the responses expected from front-line employees. We look into this underexplored territory by analyzing the drivers and consequences of the exercise of queue discretion in a setting where deviations from the queue are observable. By examining a proprietary dataset from a teleradiology company in which radiologists are assigned a queue of cases to interpret but are not restricted to follow that order, we are able to establish when deviations tend to occur, how they affect operational performance, and the conditions under which they are more effective. Examining the role of individual characteristics, we find that individuals are more likely to exercise discretion (via deviations from the queue) when their experience is greater. This finding is consistent with the view that experience leads to superior knowledge, which could lead to an ability to better identify high-leverage opportunities for deviation. It is also consistent with a perspective that individuals with more experience will have higher self-confidence and, therefore, be more likely to deviate. With respect to the contents of the queue, we show that, keeping the size of the queue fixed, individuals tend to adhere to the queue order less frequently as queue variety goes up, consistent with greater options creating more opportunities for radiologists to exercise their task scheduling discretion. We also show doctors have a higher probability of exercising discretion when doing so creates an opportunity to follow either of two particular strategies—SEPT and batching. The exercise of discretion (via deviations from the queue) has a net negative effect in our empirical setting, at least in the short-term. Given that we find both that deviations are related to worse performance, on average, and that individuals with more experience and more opportunity (e.g., queue variety) deviate more, the question remains whether deviations under those conditions are productive. We find that the answer is a qualified no. Task deviations by more experienced workers tend to lead to faster completion times than task deviations by less experienced workers, and deviations under greater variety of case types within the queue also tend to lead to faster completion times than deviations under lower variety in the queue. Deviations in each of these cases are less detrimental than other deviations but are still related to worse performance compared to the case of no deviation at all. That is, we find deviations to be related to worse operational performance, even when they are pursued to take advantage of strategies that are assumed to be—and in the case of batching, actually are—beneficial to performance. Even in the case of such highly beneficial scheduling strategies as batching, the benefits of discretion via deviation may not compensate for the costs of exercising it. 6.1. Why Deviate When It Hurts: The Discretion Fallacy Given these findings, why would individuals deviate from the assigned order? In the case of following a SEPT policy, there are likely two main reasons. One is that individuals might pursue an alternative performance metric, perhaps seeking to reduce waiting time rather than reading time. In our 22 Discretionary Task Ordering setting, this was not the strategy the company expected (nor wanted) the radiologists to follow, as the waiting time was kept under control by adjusting the pool of radiologists at work. Another more pertinent reason to follow a SEPT policy is that individuals might misperceive the implications of this approach. Note that the bivariate correlation between DEVIATION and LnREADTIME is negative; it is not until we control for case-type fixed effects that the relationship becomes positive. Hence, precisely because doctors are switching towards shorter cases (when they choose the shortest cases within the queue), this policy may create the mental illusion of working faster. Conditional on a particular case type being selected, the reading time tends to be higher when it has the shortest expected processing time within the queue, but this may be difficult for the individual to estimate. The case of batching illustrates a phenomenon that could disentangle the paradox surrounding the detrimental exercise of discretion for other types of deviations as well. We argue that one possible explanation is that individuals have the illusion of improving performance by exercising discretion because they underestimate (or fail to consider entirely) the time required to do so (e.g., the time required to look through the queue to determine the exact case to complete next). This task selection time is an opportunity cost, and as such, might be expected to be overlooked, as evaluating opportunity costs requires decision-makers to account for unrealized, implicit options, which are often neglected (Frederick et al. 2009). Seeking to increase their speed, individuals may exercise discretion in a manner that they believe will improve their workflow only to find that they have reduced their speed. These results identify a discretion fallacy, as individuals actively pursue strategies that would in fact have been beneficial for performance were it not for the unrecognized “transaction costs” of pursuing those strategies (e.g., queue reordering costs). Our findings provide evidence of the vulnerabilities of self-management and the potential value of using centralized management in settings where the costs of exercising discretion are particularly high relative to the benefits. 6.2. Contributions We make three main contributions to the literature. First, we analyze the operational implications of discretionary queue management. Scheduling tasks is a critical determinant of employee and organizational productivity (Pinedo 2012). Most studies on task (job) scheduling examine contexts where scheduling is solely a managerial decision, yet this is often not the case in practice, as front-line workers frequently have discretion in scheduling. To understand the drivers and consequences of discretion in task scheduling, we analyze the choices that individuals make regarding the order in which to execute tasks. In so doing, we identify negative performance implications of deviating from the assigned task ordering. Future analytical work should incorporate the endogeneity of deviations. Allowing for deviations may more accurately reflect many situations, and accounting for such deviations may lead to unexpected recommendations on how best to structure work. 23 Discretionary Task Ordering Second, we uncover a discretion fallacy. Individuals consistently deviate from the order of tasks within their queues despite the fact that this reordering of tasks often results in slower completion times on average. We find that deviation penalizes an individual’s speed of work by an amount that is equal in magnitude to the performance benefit of three years of experience (one year above the mean experience of two) or four additional case types in the queue (one more than the mean of three). Overall, deviations take 13% longer on average, underscoring the importance of understanding this phenomenon. Our investigation of batching suggests a potentially important reason why this behavior occurs. When cases are exogenously, or naturally, batched in the queue, processing times are faster. When individuals deviate in a manner that is consistent with batching, however, we find that completion times are slower than in situations where they do not deviate. Thus, it is possible that individuals believe that deviations will prove more productive even though, in practice, they are not. This creates an opportunity to examine ways to encourage individuals to deviate less. This may be through nudges that provide information, while retaining discretion for individuals, or alternatively shifting the locus of control in a system toward centralized ordering of tasks. More research is needed in each case. Third, we identify several conditions under which an individual is more likely to deviate – namely, when that individual has greater experience and when the queue offers more options including more variety in the queue, shorter cases, and batching opportunities. Each of these results offers important theoretical grounding for system design. Future analytical and empirical work can consider questions such as work scheduling not only among workers with different levels of experience but also among pools of workers with varying levels of heterogeneity on other dimensions. In addition, assignment heuristics that consider the variety in a worker’s queue may be a valuable way to assign subsequent tasks in a manner that improves productivity. 6.3. Limitations As is the case with any study, ours has limitations of which a reader should be aware. First, despite the fact that we are able to observe over 2.4 million queue choices, we are still examining one organization in which these doctors work. Although we would expect our findings to generalize to many other settings, research that analyzes alternative settings would prove valuable. For example, other settings with lower task-selection cost relative to the benefits of queue reordering may exhibit a positive effect of deviations on performance. Our data does not allow us to test this hypothesis, but future research could explore the conditions under which deviations might be effective. Second, we examine the immediate consequences of deviations on performance. Prior work notes that task assignment may have different short- vs. long-term effects (Staats and Gino 2012). In a similar vein, more work is needed to understand the longer-term implications of deviation on performance. Third, future research should explore alternative reasons to deviate based on principles of operations management. We find a SEPT 24 Discretionary Task Ordering policy and batching to be relevant, and other empirical settings might allow tests regarding other generally applicable practices. Finally, although we can see the actual choices that individuals make with respect to their task ordering, we do not observe their beliefs or reasons for such choices. Future research should collect data on such reasons and manipulate conditions (e.g., a field experiment studying batching) to better understand the mechanisms that drive performance. 6.4. Managerial Implications Our paper contributes not only to the theory of operations management but also to its practice. First, our finding that deviations are, on average, related to worse performance calls for a warning to managers and workers concerning the costs of exercising certain types of discretion. Although the task sequence assignment might not be optimal, allowing front-line workers to take an active role in scheduling might not be advisable in settings where the time required to exercise discretion exceeds the benefits of doing so. Managers should thus pay attention to the effects of deviations on productivity in their settings. In any setting, reorganizing the queue takes time, so managers should look for ways to reduce this time while maintaining the benefits from better ordering of queues. Reducing workers’ need or desire for task reordering, through either centralized queue management or more responsive software for task ordering, can be a way for managers to improve productivity. Our findings regarding the ability of experience and queue variety to offset—though only partially offset—the detrimental effects of deviations on performance suggest that organizations need to take these variables into account in structuring work. In settings, such as ours, where deviations reduce performance on average, the benefits from experience of senior employees are reduced by their tendency to deviate from the queue more often and varied queues lead to more deviation without a performance benefit. In such contexts, managers have the opportunity to improve performance by creating awareness, finding ways to persuade experienced employees to adhere to the queue sequence, and perhaps even going as far as limiting the variety and visibility of the cases within the queue. In settings where deviations are performance enhancing, organizations might look for ways to increase the amount of variety in each worker’s queue or encourage newcomers to explore ways to better organize their work. Moreover, organizations should explore how experienced employees are able to exercise discretion more efficiently to identify additional ways of improving deviation, since there may be times when discretion should be granted for reasons independent of their efficiency. Finally, managers should consider the time required to exercise discretion when using data analytics to inform prescriptions, as illustrated by the fact that the time required to reorder a queue offsets the beneficial effects of batching in our setting. 6.5. Conclusion Despite the prevalence of discretion in practice, little is known about how workers use discretion and how to manage scheduling under circumstances in which discretion exists. Seeking to fill this gap, we consider 25 Discretionary Task Ordering the operational implications of discretionary queue management. We find that the discretionary reordering of queues is common in our setting, but that, on average, it has negative implications for performance. We identify conditions that lead to greater deviation and document the performance effects of these deviations. Taken together, our results highlight the importance of discretion in queue management and identify mechanisms to improve productivity through intelligent delegation of these decisions. 26 Discretionary Task Ordering 7. References Amar, M., D. Ariely, S. Ayal, C. E. Cryder and S. I. Rick (2011). "Winning the battle but losing the war: The psychology of debt management." Journal of Marketing Research 48(SPL): S38-S50. Anand, K. S., M. F. Paç and S. Veeraraghavan (2011). "Quality–Speed Conundrum: Trade-offs in Customer-Intensive Services." Management Sci. 57(1): 40-56. Argote, L. and E. Miron-Spektor (2011). "Organizational learning: From experience to knowledge." Organ. Sci. 22(5): 1123-1137. Bassamboo, A. and R. S. Randhawa (2015). "Scheduling homogeneous impatient customers." Management Science. Bendoly, E., M. Swink and W. P. Simpson (2014). "Prioritizing and monitoring concurrent project work: Effects on switching behavior." Production and Operations Management 23: 847–860. Berman, O., R. C. Larson and E. Pinker (1997). "Scheduling workforce and workflow in a high volume factory." Management Sci. 43(2): 158-172. Berry Jaeker, J. and A. L. Tucker (2015). "Past the point of speeding up: The negative effects of workload saturation on efficiency and quality." Management Sci. Boudreau, J., W. Hopp, J. O. McClain and L. J. Thomas (2003). "On the interface between operations and human resources management." Manufacturing Service Oper. Management 5(3): 179-202. Bowman, E. H. (1963). "Consistency and optimality in managerial decision making." Management Sci. 9(2): 310-321. Brynjolfsson, E. and A. McAfee (2014). The second machine age: Work, progress, and prosperity in a time of brilliant technologies. New York, WW Norton & Company. Campbell, D. and F. X. Frei (2011). "Market heterogeneity and local capacity decisions in services." Manufacturing Service Oper. Management 13(1): 2-19. Fisher, M. L. and C. D. Ittner (1999). "The impact of product variety on automobile assembly operations: Empirical evidence and simulation analysis." Management Sci. 45(6): 771-786. Frederick, S., N. Novemsky, J. Wang, R. Dhar and S. Nowlis (2009). "Opportunity cost neglect." Journal of Consumer Research 36: 553-561. Froehle, C. M. and D. L. White (2014). "Interruption and forgetting in knowledge-intensive service environments." Production and Operations Management 23(4): 704-722. Gal, D. and B. B. McShane (2012). "Can small victories help win the war? Evidence from consumer debt management." Journal of Marketing Research 49(4): 487-501. Gans, N., G. Koole and A. Mandelbaum (2003). "Telephone call centers: Tutorial, review, and research prospects." Manufacturing Service Oper. Management 5(2): 79-141. Grant, A. M., Y. Fried, S. K. Parker and M. Frese (2010). "Putting job design in context: Introduction to the special issue." J. of Organ. Behav. 31: 145-157. Greene, W. (2004). "The behaviour of the maximum likelihood estimator of limited dependent variable models in the presence of fixed effects." The Econometrics Journal 7: 98–119. Hackman, J. R. and G. R. Oldham (1976). "Motivation through the design of work: Test of a theory." Organ. Behav. Human Decision Processes 16(2): 250-279. Halsted, M. J. and C. M. Froehle (2008). "Design, implementation, and assessment of a radiology workflow management system." American Journal of Roentgenology 191(2): 321-327. Heckman, J. J. (1978). "Dummy endogenous variables in a simultaneous equation system." Econometrica 46: 931–959. Hopp, W. J., S. M. R. Iravani and G. Y. Yuen (2007). "Operations systems with discretionary task completion." Management Sci. 53(1): 61-77. Huckman, R. S. and G. P. Pisano (2006). "The firm specificity of individual performance: Evidence from cardiac surgery." Management Sci. 52(4): 473-488. Huckman, R. S., B. R. Staats and D. M. Upton (2009). "Team familiarity, role experience, and performance: Evidence from Indian software services." Management Sci. 55(1): 85-100. 27 Discretionary Task Ordering KC, D. and B. R. Staats (2012). "Accumulating a portfolio of experience: The effect of focal and related experience on surgeon performance." Manufacturing Service Oper. Management 14(4): 618-633. KC, D. and C. Terwiesch (2009). "Impact of workload on service time and patient safety: An econometric analysis of hospital operations." Management Sci. 55(9): 1486-1498. King, D. L., D. I. Ben‐Tovim and J. Bassham (2006). "Redesigning emergency department patient flows: application of Lean Thinking to health care." Emergency Medicine Australasia 18(4): 391397. Kremer, M., B. Moritz and E. Siemsen (2011). "Demand forecasting behavior: System neglect and change detection." Management Sci. 57(10): 1827-1843. Lapré, M. A. and I. M. Nembhard (2010). "Inside the organizational learning curve: Understanding the organizational learning process." Foundations and Trends in Technology, Information and Operations Management 4(1): 1-103. Lu, Y., A. Hechling and M. Olivares (2014). "Productivity analysis in services using timing studies." MacDuffie, J. P. (1997). "The road to "Root Cause": Shop-floor problem-solving at three auto assembly plants." Management Sci. 43(4): 479-502. Maddala, G. S. (1983). Limited-Dependent and Qualitative Variables in Econometrics. Cambridge, Cambridge University Press. Monsell, S. (2003). "Task switching." Trends in cognitive sciences 7(3): 134-140. Narayanan, S., S. Balasubramanian and J. M. Swaminathan (2009). "A matter of balance: Specialization, task variety, and individual learning in a software maintenance environment." Management Sci. 55(11): 1861-1876. Phillips, R., A. S. Şimşek and G. v. Ryzin (2015). "The effectiveness of field price discretion: Empirical evidence from auto lending." Management Sci. Pierce, L., D. Snow and A. McAfee (2014). "Cleaning house: The impact of information technology monitoring on employee theft and productivity." Management Sci. Pinedo, M. L. (2012). Scheduling: Theory, Algorithms, and Systems. New York, Springer. Pinedo, M. L. and B. P.-C. Yen (1997). "On the design and development of object-oriented scheduling systems." Annals of Operations Research 70(1): 359–378. Pisano, G. P., R. M. J. Bohmer and A. C. Edmondson (2001). "Organizational differences in rates of learning: Evidence from the adoption of minimally invasive cardiac surgery." Management Sci. 47(6): 752-768. Powell, A., S. Savin and N. Savva (2012). "Physician workload and hospital reimbursement: Overworked physicians generate less revenue per patient." Manufacturing Service Oper. Management. Roberti, R., E. Bartolini and A. Mingozzi (2015). "The fixed charge transportation problem: An exact algorithm based on a new integer programming formulation." Management Sci. Saghafian, S., W. J. Hopp, M. P. Van Oyen, J. S. Desmond and S. L. Kronick (2014). "Complexityaugmented triage: A tool for improving patient safety and operational efficiency." Manufacturing & Service Operations Management 16(3): 329-345. Schultz, K. L., T. Schoenherr and D. Nembhard (2010). "An example and a proposal concerning the correlation of worker processing times in parallel tasks." Management Sci. 56(1): 176-191. Schweitzer, M. E. and G. P. Cachon (2000). "Decision bias in the newsvendor problem with a known demand distribution: Experimental evidence." Management Sci. 46(3): 404-420. Shumsky, R. A. and E. J. Pinker (2003). "Gatekeepers and referrals in services." Management Sci. 49(7): 839-856. Staats, B. R., D. J. Brunner and D. M. Upton (2011). "Lean principles, learning, and knowledge work: Evidence from a software services provider." J. of Operations Management 29(5): 376-390. Staats, B. R. and F. Gino (2012). "Specialization and variety in repetitive tasks: Evidence from a Japanese bank." Management Sci. 58(6): 1141-1159. Stanovich, K. E. and R. F. West (2000). "Individual differences in reasoning: Implications for the rationality debate?" Behavioral and Brain Sciences 23: 645-665. 28 Discretionary Task Ordering Tan, T. F. and S. Netessine (2014). "When does the devil make work? An empirical study of the impact of workload on server's performance." Management Sci. 60(6): 1574-1593. Taylor, F. W. (1911). The Principles of Scientific Management. New York, Harper & Brothers. Tost, L. P., F. Gino and R. Larrick (Forthcoming). "Power, competitiveness, and advice taking: Why the powerful don't listen." Organ. Behav. Human Decision Processes. Truong, V. A. (2015). "Optimal advance scheduling." Management Science 61(7): 1584-1597. Tucker, A. L. (2007). "An empirical study of system improvement by frontline employees in hospital units." Manufacturing Service Oper. Management 9(4): 492–505. Tucker, A. L., I. M. Nembhard and A. C. Edmondson (2007). "Implementing new practices: An empirical study of organizational learning in hospital intensive care units." Management Sci. 53(6): 894907. van Donselaar, K., V. Gaur, T. van Woensel, R. Broekmeulen and J. Fransoo (2010). "Ordering behavior in retail stores and implications for automated replenishment." Management Sci. 56(5): 766-784. Wang, L., I. Gurvich, J. A. V. Mieghem and K. J. O’Leary (2015). "Collaboration and professional labor productivity: An empirical study of physician workflows in a hospital." Zhu, E., T. G. Crainic and M. Gendreau (2014). "Scheduled service network design for freight rail transportation." Operations Research 62(2): 383-400. 29 Discretionary Task Ordering 8. Tables and Figures Table 1. Variables Measure Ln READTIME DEVIATION (indicator) QSIZE NUM_IMAGES ORDER_IN_SHIFT RESTART (indicator) EXPERIENCE QVARIETY REPEAT_OPPTY (indicator) REPEAT (indicator) SEPT_OPPTY (indicator) SEPT (indicator) Calendar year indicators Radiologist indicators Case type indicators Description Amount of time (measured in minutes) the radiologist spends working on the current case (logged). Indicates whether the case is not the first one in the queue ordered according to the scheduling policy established by management. That is, whether the worker deviates from the assigned order and selects a case other than the first one in the queue. Length of the radiologist’s queue (i.e., a count of pending jobs) when starting reading the current case (that is, when selecting the current case to be read next). Number of images in the current case. Number of cases read by the radiologist since the beginning of the current shift, including the current case. That is, the case number within the shift. Indicator variable equal to one if the queue was empty when the previous case was finished, and zero otherwise. Employee job tenure in years (with decimals), measured from the number of days that the radiologist has been working at the company when interpreting the current case. Variety of case types in the queue, measured as the number of different case types in the queue. Indicator variable equal to one when the first case in the queue is different from the case just finished but it is possible to repeat by deviating from the queue (i.e., the reminder of the queue contained a case of the same type as the one just finished), and zero otherwise. That is, this variable represents an opportunity to batch by deviating. Indicator variable equal to one when the current case is of the same type as the case just finished by this radiologist (batching), and zero otherwise. This is computed considering all the cases (full sample). Indicator variable equal to one when the first case in the queue is not of the Shortest Expected Processing Time (SEPT) type within the queue, and zero otherwise. This represents the situations when the individual can follow a SEPT policy by deviating. The expected processing time (EPT) for a given case is calculated as the average reading time for this case type for this radiologist. Indicator variable equal to one when the case is of the Shortest Expected Processing Time (SEPT) type within the queue, and zero otherwise. As above, the expected processing time (EPT) for a given case is calculated as the average reading time for this case type for this radiologist. To control for time trends, the empirical analysis include fixed effects for calendar year. Indicators for each radiologist, which controls for time-invariant characteristics of individuals. Case type (defined by the technology and anatomy) fixed effects. Notes. The unit of observation is a radiologist's case. 30 Discretionary Task Ordering Table 2. Descriptive Statistics (n=2,408,218) (1) Variable Ln READTIME Mean 1.295 Std. Dev. 0.727 Min 0 Max 3.434 (2) DEVIATION 0.419 0.493 0 1 (3) QSIZE 5.553 3.791 2 58 (4) NUM_IMAGES 1.414 0.598 1 17 (5) ORDER_IN_SHIFT 57.653 51.612 2 459 (6) RESTART 0.013 0.114 0 1 (7) EXPERIENCE 1.906 1.234 0 5.501 (8) QVARIETY 3.19 1.479 1 14 (9) SEPT_OPPTY 0.635 0.481 0 1 (10) SEPT 0.455 0.498 0 1 (11) REPEAT_OPPTY 0.206 0.404 0 1 (12) REPEAT 0.125 0.331 0 1 (1) (2) (1) Variable Ln READTIME (3) (4) (5) (6) (2) DEVIATION -0.073* (3) QSIZE -0.273* 0.200* (4) NUM_IMAGES 0.201* -0.088* -0.031* (5) ORDER_IN_SHIFT -0.177* 0.042* 0.159* 0.043* (6) RESTART 0.145* 0.136* -0.084* -0.032* -0.019* (7) EXPERIENCE -0.054* 0.068* 0.109* -0.007* -0.053* -0.013* (8) QVARIETY -0.250* 0.201* 0.765* -0.146* 0.100* -0.078* 0.107* (9) SEPT_OPPTY -0.014* 0.169* 0.213* 0.136* 0.048* 0.001 0.043* 0.324* (10) SEPT -0.083* 0.048* -0.189* -0.392* -0.056* 0.069* -0.014* -0.288* -0.545* (11) REPEAT_OPPTY -0.068* 0.104* 0.193* -0.002* 0.007* -0.016* 0.039* 0.246* 0.308* -0.155* (12) REPEAT -0.111* 0.070* 0.024* -0.257* -0.040* 0.011* 0.013* 0.012* -0.116* 0.307* * Significant at the 5% level. 31 (7) (8) (9) (10) (11) 0.203* Discretionary Task Ordering Table 3. Drivers and Performance Implications of Deviations PANEL A: Determinants of Deviations Dependent Variable: DEVIATION (1a) Coefficients QSIZE 0.0620*** (0.0041) NUM_IMAGES -0.0110 (0.0099) ORDER_IN_SHIFT -0.0004*** (0.0001) EXPERIENCE (H1) (1b) AME 0.0219 (2a) Coefficients 0.0615*** (0.0041) -0.0039 -0.0092 (0.0101) -0.0001 -0.0005*** (0.0001) 0.2266*** (0.0163) (2b) AME 0.0217 -0.0032 -0.0002 0.0798 QVARIETY (H2) SEPT_OPPTY (H3) REPEAT_OPPTY (H4) Log Pseudolikelihood -1,479,077 -1,475,568 (3a) (3b) (4a) (4b) (5a) (5b) Coefficients AME Coefficients AME Coefficients AME 0.0400*** 0.0141 0.0409*** 0.0144 0.0408*** 0.0143 (0.0039) (0.0039) (0.0039) -0.0089 -0.0031 -0.0083 -0.0029 -0.0081 -0.0029 (0.0101) (0.0101) (0.0101) -0.0004*** -0.0001 -0.0004*** -0.0001 -0.0004*** -0.0001 (0.0001) (0.0001) (0.0001) 0.2171*** 0.0763 0.2175*** 0.0764 0.2178*** 0.0765 (0.0157) (0.0157) (0.0157) 0.0731*** 0.0257 0.0597*** 0.0210 0.0578*** 0.0203 (0.0050) (0.0049) (0.0049) 0.1090*** 0.0385 0.1026*** 0.0362 (0.0072) (0.0072) 0.0428*** 0.0151 (0.0045) -1,473,704 -1,472,905 -1,472,726 (continued on next page) 32 Discretionary Task Ordering Dependent Variable: Ln READTIME QSIZE NUM_IMAGES ORDER_IN_SHIFT RESTART DEVIATION (H5) EXPERIENCE DEVIATION * EXPERIENCE QVARIETY DEVIATION * QVARIETY PANEL B: Impact of Deviations on Performance (1) (2) (3) -0.0337*** -0.0336*** -0.0240*** (0.0023) (0.0023) (0.0025) 0.1719*** 0.1714*** 0.1714*** (0.0062) (0.0062) (0.0061) -0.0003*** -0.0003*** -0.0003*** (0.0001) (0.0001) (0.0001) 0.8540*** 0.8516*** 0.8362*** (0.0202) (0.0199) (0.0201) 0.1160*** 0.1356*** 0.1897*** (0.0160) (0.0160) (0.0183) -0.0560*** -0.0533*** (0.0127) (0.0127) -0.0089** -0.0079** (0.0035) (0.0035) -0.0298*** (0.0026) -0.0098*** (0.0018) SEPT DEVIATION * SEPT (4) -0.0239*** (0.0024) 0.1710*** (0.0062) -0.0004*** (0.0001) 0.8327*** (0.0200) 0.1409*** (0.0197) -0.0498*** (0.0125) -0.0078** (0.0035) -0.0207*** (0.0026) -0.0108*** (0.0020) 0.0511*** (0.0055) 0.0127** (0.0060) REPEAT DEVIATION * REPEAT Log Pseudolikelihood -3,765,706 -3,762,503 -3,757,028 -3,755,367 (5) -0.0239*** (0.0024) 0.1710*** (0.0062) -0.0004*** (0.0001) 0.8318*** (0.0200) 0.1353*** (0.0199) -0.0496*** (0.0124) -0.0077** (0.0035) -0.0208*** (0.0025) -0.0104*** (0.0019) 0.0498*** (0.0055) 0.0166*** (0.0056) -0.0124*** (0.0047) -0.0113** (0.0047) -3,755,112 Notes. Panel A reports the results from maximum-likelihood probit estimation. Panel B reports the results from linear regression performance model from maximum-likelihood estimation where DEVIATION is modeled based on the corresponding probit specification from Panel A (see text for details). The unit of observation is a radiologist's case. The number of observations is 2,408,218. In Panel A, the dependent variable is whether the case was a deviation (DEVIATION). In Panel B, the dependent variable is performance, measured as the natural log of the reading time per case, Ln READTIME. Robust standard errors (in parentheses) are clustered by radiologist. AME, average marginal effect, in brackets. All specifications include a constant term and case type (technology and anatomy), radiologist and year fixed effects. The probit specifications also include fixed effects for the case type of the first case in the queue. *10% statistical significance; **5% statistical significance; ***1% statistical significance. 33 Discretionary Task Ordering Table 4.A. Average Treatment Effects (Predictive Margins) by Experience No Deviation Deviation (DEVIATION = 0) (DEVIATION = 1) EXPERIENCE 1 year 3.9398 4.4723 2 years 3.7252 4.1913 3 years 3.5222 3.9281 4 years 3.3304 3.6814 5 years 3.1490 3.4501 Notes. This table reports the predicted mean reading time (in minutes) per case based on the performance model in Column 2 of Panel B in Table 3. The unit of observation is a radiologist's case. The number of observations is 2,408,218. Table 4.B. Average Treatment Effects (Predictive Margins) by Queue Variety No Deviation Deviation (DEVIATION = 0) (DEVIATION = 1) QVARIETY 1 3.9370 4.6448 (Number of 2 3.8212 4.4645 unique case 3 3.7088 4.2911 types in the 4 3.5998 4.1244 queue) 5 3.4939 3.9643 6 3.3912 3.8103 7 3.2915 3.6624 8 3.1947 3.5201 9 3.1007 3.3834 10 3.0096 3.2521 11 2.9211 3.1258 12 2.8352 3.0044 13 2.7518 2.8877 14 2.6709 2.7756 Notes. This table reports the predicted mean reading time (in minutes) per case based on the performance model in Column 3 of Panel B in Table 3. The unit of observation is a radiologist's case. The number of observations is 2,408,218. 34 Discretionary Task Ordering Table 4.C. Average Treatment Effects (Predictive Margins) by Whether Following a Shortest Expected Processing Time (SEPT) Policy Not the Shortest Case (SEPT = 0) Shortest Case (SEPT = 1) No Deviation (DEVIATION = 0) Deviation (DEVIATION = 1) 3.6998 4.0684 3.8938 4.3365 Difference 0.3685 (10%) 0.4427 (11%) 0.1940 0.2681 (5%) (7%) Notes. This table reports the predicted mean reading time (in minutes) per case based on the performance model in Column 4 of Panel B in Table 3. The unit of observation is a radiologist's case. The number of observations is 2,408,218. Difference Table 4.D. Average Treatment Effects (Predictive Margins) by Whether Repeating Case Type No Repetition (REPEAT = 0) Repetition (REPEAT = 1) No Deviation (DEVIATION = 0) Deviation (DEVIATION = 1) 3.7906 4.1804 3.7437 4.0823 Difference 0.3898 (10%) 0.3386 (9%) -0.0469 -0.0981 (-1%) (-2%) Notes. This table reports the predicted mean reading time (in minutes) per case based on the performance model in Column 5 of Panel B in Table 3. The unit of observation is a radiologist's case. The number of observations is 2,408,218. The first column compares repetitions versus non-repetitions when there is no deviation. The second column compares repetition versus non-repetition when there is deviation. The difference within row represents deviation versus non-deviation. Difference 35 Discretionary Task Ordering Table 5. Extensions: Experience Measure, Impact of Repetitions on Performance, Linear Probability Model, and Hospitals Fixed Effects Dependent Variable: Model: QSIZE NUM_IMAGES ORDER_IN_SHIFT (1a) (1b) DEVIATION LnREADTIME EXPERIENCE: Volume of Cases 0.0608*** (0.0042) -0.0101 (0.0100) -0.0005*** (0.0001) RESTART DEVIATION EXPERIENCE DEVIATION*EXPERIENCE 0.0047*** (0.0005) -0.0331*** (0.0024) 0.1714*** (0.0062) -0.0003*** (0.0001) 0.8509*** (0.0201) 0.1237*** (0.0169) -0.0018*** (0.0005) -0.0003* (0.0001) (2) LnREADTIME OLS -0.0309*** (0.0021) 0.1712*** (0.0064) -0.0003*** (0.0001) 0.8769*** (0.0184) (3) LnREADTIME OLS -0.0229*** (0.0023) 0.1709*** (0.0064) -0.0004*** (0.0001) 0.8697*** (0.0180) (4) DEVIATION LPM 0.0145*** (0.0014) -0.0031 (0.0036) -0.0001*** (0.0000) -0.0500*** (0.0127) -0.0459*** (0.0125) 0.0770*** (0.0057) 0.2114*** (0.0155) -0.0223*** (0.0022) 0.0201*** (0.0017) 0.0582*** (0.0048) 0.0389*** (0.0032) 0.1002*** (0.0072) QVARIETY (5a) (5b) DEVIATION LnREADTIME Hospital Fixed Effects 0.0410*** (0.0038) -0.0070 (0.0098) -0.0004*** (0.0001) DEVIATION*QVARIETY SEPT_OPPTY 0.0551*** (0.0052) SEPT 0.0495*** (0.0054) 0.0165*** (0.0055) SEPT*DEVIATION 0.0139*** (0.0014) REPEAT_OPPTY REPEAT -0.0170*** (0.0049) REPEAT*DEVIATION -0.0170*** (0.0048) -0.0236*** (0.0024) 0.1672*** (0.0062) -0.0004*** (0.0001) 0.8320*** (0.0200) 0.1378*** (0.0194) -0.0487*** (0.0123) -0.0080** (0.0034) -0.0209*** (0.0025) -0.0105*** (0.0019) 0.0406*** (0.0047) -0.0102** (0.0046) -0.0116** (0.0047) Notes. This table reports the results from maximum-likelihood estimation (Columns 1 and 5) of the probit deviation model (a) and the performance model (b); ordinary least squares estimation of the performance model (Columns 2 and 3), and the linear probability model (LPM) of the deviation equation (Column 4). The unit of observation is a radiologist's case. The number of observations is 2,408,218. Log Likelihood is -3,763,000, -1,543,000 and -3,742,000 in Columns 1, 4 and 5, respectively. R-squared is 0.2616, 0.2596 and 0.1336 for the results in Columns 2, 3 and 4, respectively. Robust standard errors (in parentheses) are clustered by radiologist. In Column 1, EXPERIENCE is the number of cases interpreted by this radiologist up to the current case, in thousands. All specifications include a constant term and case type (modality and anatomy), radiologist and year fixed effects. The specifications in Columns 1a, 4, and 5a also include fixed effects for the case type of the first case in the queue. The specifications in Columns 5a and 5b also include hospital fixed effects. *10% statistical significance; **5% statistical significance; ***1% statistical significance. 36 Discretionary Task Ordering Table 6. Robustness Checks: Alternative Shift-Break Cutoffs Dependent Variable: Shift-Break Cutoffs: QSIZE NUM_IMAGES ORDER_IN_SHIFT (1a) DEVIATION 20-minutes (1b) LnREADTIME 20-minutes (2a) DEVIATION 40-minutes (2b) LnREADTIME 40-minutes (3a) DEVIATION 90-minutes (3b) LnREADTIME 90-minutes (4a) DEVIATION 480-minutes (4b) LnREADTIME 480-minutes 0.0413*** (0.0040) -0.0121 (0.0100) -0.0004*** (0.0001) -0.0235*** (0.0024) 0.1616*** (0.0058) -0.0003*** (0.0001) 0.7975*** (0.0198) 0.1279*** (0.0188) -0.0466*** (0.0124) -0.0078** (0.0035) -0.0195*** (0.0025) -0.0105*** (0.0019) 0.0404*** (0.0038) -0.0063 (0.0102) -0.0003*** (0.0001) -0.0238*** (0.0024) 0.1712*** (0.0062) -0.0004*** (0.0001) 0.8665*** (0.0205) 0.1359*** (0.0199) -0.0492*** (0.0124) -0.0078** (0.0034) -0.0209*** (0.0025) -0.0104*** (0.0019) 0.0401*** (0.0038) -0.0055 (0.0103) -0.0003** (0.0001) -0.0238*** (0.0024) 0.1713*** (0.0062) -0.0004*** (0.0001) 0.8860*** (0.0234) 0.1376*** (0.0199) -0.0487*** (0.0124) -0.0080** (0.0033) -0.0212*** (0.0025) -0.0103*** (0.0019) 0.0400*** (0.0038) -0.0055 (0.0103) -0.0002* (0.0001) -0.0238*** (0.0025) 0.1712*** (0.0062) -0.0004*** (0.0001) 0.8956*** (0.0242) 0.1369*** (0.0199) -0.0487*** (0.0125) -0.0078** (0.0033) -0.0212*** (0.0025) -0.0101*** (0.0019) RESTART DEVIATION EXPERIENCE DEVIATION* EXPERIENCE QVARIETY DEVIATION* QVARIETY SEPT_OPPTY 0.2185*** (0.0157) 0.0594*** (0.0049) 0.1063*** (0.0071) SEPT REPEAT REPEAT*DEVIATION 0.0571*** (0.0048) 0.0957*** (0.0074) 0.0470*** (0.0055) 0.0175*** (0.0055) SEPT*DEVIATION REPEAT_OPPTY 0.2171*** (0.0157) 0.0427*** (0.0047) 0.0561*** (0.0048) 0.0925*** (0.0075) 0.0501*** (0.0055) 0.0163*** (0.0055) 0.0489*** (0.0051) -0.0106** (0.0046) -0.0119** (0.0047) 0.2171*** (0.0156) 0.0560*** (0.0048) 0.0924*** (0.0075) 0.0503*** (0.0055) 0.0158*** (0.0055) 0.0561*** (0.0058) -0.0126*** (0.0047) -0.0116** (0.0048) 0.2171*** (0.0155) 0.0500*** (0.0055) 0.0157*** (0.0055) 0.0580*** (0.0060) -0.0129*** (0.0047) -0.0117** (0.0048) -0.0128*** (0.0047) -0.0117** (0.0048) Notes. This table reports the results from maximum-likelihood estimation of the probit deviation model (a) and the performance model (b). The unit of observation is a radiologist's case. The number of observations is 2,387,032, 2,412,544, 2,416,535 and 2,417,453 in Columns 1, 2, 3, 4, respectively. Robust standard errors (in parentheses) are clustered by radiologist. All specifications include a constant term and case type (technology and anatomy), radiologist and year fixed effects. The probit specifications also include fixed effects for the case type of the first case in the queue. *10% statistical significance; **5% statistical significance; ***1% statistical significance 37