Comments: Reviewer #3: I think this is an excellent methodological paper regarding the use of animal movement data and it has many potential applications in veterinary epidemiology. I would appreciate if it is mentioned in the manuscript the software used and if the authors are willing to provide access to the model code. This will allow the use of the model under different datasets with the same structure. MCMC methods are flexible tools for Bayesian inference. It does however require some care in terms of initializations, proposal function, convergence etc. Also, the program may not be implemented directly unless the data is structured in exactly the same way, which it generally would not be. There is for instance no consensus on what information is registered in data bases held within countries in the EU. At the same time, the parts of the program will not make much sense by themselves. We therefore do not believe it will be very helpful to publish the actual code. We believe that as long as the paper is published with the appendices it will contain sufficient information to aid other researchers that might want to use the method or parts of it. In the revised manuscript, we discuss this briefly in the discussion where we address advantages and disadvantages of the method presented. We have also included information on the software used at the end of the Material and method section. Introduction Page 4 Lines 11 - 12. The last sentence stated that the super spreaders may cause stochastic transmission dynamics. However is not clear the meaning of this sentence. Maybe the authors can add an small example or interpretation of this. Also, if you remove this sentence, it doesn't change anything in the paper. This concept (stochastic transmission dynamics) is not discussed further. We agree with the reviewer that this concept is left unexplained and since the manuscript is already quite long we have chosen to remove this rather than elaborating on the issue. As the reviewer points out, it is not essential for the study. Page 4 Lines 21 - 24. Revise English We have rewritten this to make our point clearer. Page 4 Lines 24 - 25. The authors mentioned that this paper addresses same questions answered by classical network analysis but it is not clear why the proposed methodology is different from those studies. It would add value to this paper a discussion of the main advantages and disadvantages of the methodology proposed here as compared to the methodologies used in other studies. There are several papers that characterizes contact networks and use population demographics, production management, etc to characterize risk nodes. I think that would be very valuable to put this paper in the context of previous studies from a methodological point of view. We made some small alteration of the text here and instead incorporated a larger section on the relation to network analysis in the discussion section. Page 5 Lines 15 - 18. I would suggest changing contact patterns by contact probabilities (the objective is more clear if it is written as in page 7 Lines 7 -8) We have rewritten so that it now reads that the aim is to “…to use a probabilistic model to investigate how the contact pattern is influenced by…”. Hence, we mean that we investigate the contact patter through the contact probabilities. Also I would suggest removing the last objective of this paper. This manuscript can very well discuss how the information obtained from the model can be applied to other disease spread models, or risk assessment models without showing results from a very specific simulation model. I would leave this up to the editor to decide if this is necessary or not. We believe that the simulation model, while simplistic, clarifies some implications of our analysis. The message of this part may also be more accessible to some of the readers of Prev. Vet. Med., a point that was made by the editor of our previous paper (Lindström et al. 2010) in response to a similar issue raised by a reviewer for that paper. In our opinion, the manuscript loses some of its message if this is removed. Material and Methods Page 6 - Lines 13 - 17. Include here the abbreviations for production types and use them throughout the document (including Tables - some table use abbreviations and some don't). We agree with the reviewer that abbreviations often can be useful to reduce the amount of text. In this case we however argue that using the full names serves a purpose because the names of the production types are explanatory and provides the reader with intuitive information about the production types when mentioned. Using abbreviations throughout the paper will lead to constant flipping to the page where these are first introduced and hence reduce the readability of the manuscript. If it however is the opinion of the editor that we should use abbreviations we are happy to comply. Page 6 Line 24 - Typo - replace Euclidian by Euclidean Fixed Page 7 - Lines 15 - 23. I would recommend to add some basic interpretation (and example) for h and Q as done for cuI in page 8 (lines 6 -7). Maybe bring the text from Table 3 here so the reader doesn't need to find this when reading the results. This might improve the whole understanding of the model in this section. We have rewritten the section to clarify the interpretation of h and Q. We agree that the differences between h and Q are perhaps most clearly illustrated by an example and has therefore added an explicit example (using farrow-to-finnish and Fattening herds) in the discussion section. We believe it makes more sense to provide this after presenting the results. It would also improve the understanding of the model by describing how the farm sizes for the "missing information" group was (as it is understood from page 6 lines 13 -17 ) estimated. This might be quite simple for readers with more background in mathematic statistics but not necessarily for most of the audience for this journal. Missing information only refers to missing production types, not herd size and location. To avoid confusion we changed the text to state that explicitly. Page 6 - Headings and sub-headings. Change heading 2.2 by Parameters estimation and model description (since this is the order they are presented). We have not changed this according to the suggestion of the reviewer. The section does start with a description of model and model parameters which is the followed by the parameter estimation. We could, following the reviewers comments, change to “Model, parameters and parameter estimation” but we do not believe that this would be useful. The following sub-headings would improve reading: 2.2.1 Parameter estimation (Page 7 after Line 8 add - Weight of production types (v); After line 13 - Dependence on production types; After line 23 - Dependence on farm sizes and After line 8 - Dependence on movement istance. Then add 2.2.2 Model description. Move the paragraph in Page 9 lines 8 - 19 to the section weight of production types. Following the suggestion of the reviewer, section 2.2 now includes the sub headings 2.2.1 Weight on production types 2.2.2 Dependence on production types 2.2.3 Herd size dependence 2.2.4 Distance dependence 2.2.5 Contact probability model (this is mainly the content of the section previously called 2.2.1 Model specifics. Following the advice of the reviewer we however moved the first paragraph to 2.2.1 Weight on production types) 2.2.6 Comparing observed and predicted movement distances (formerly denoted 2.2.2). Page 9. Equation 1 replace "l" in denominator for by "k" since "k" in this situation is l (k-1), this would be the same as presented by Lindström et al 2010. We have rewritten the equation so that we explicitly say that we normalize v-hat by summation over all production types l not equal to m. We believe that the indexation using l clarifies the point that for every farm and production type, v-hat is calculated by normalization over all production types that the farm could have. Page 13 lines 20-24 ; Page 14 Line 1. Not clear the relationship between between number of holdings and number of infections recorded up to 4 weeks. Please clarify. We have rewritten and hopefully clarified the point. Results Present results in the same order as presented in MM (it will improve consistency and reading of the manuscript): 1) weight of production types (v), 2) Production type dependent parameters (h an Q); 3) Farm size dependent parameters - c; 4) distance dependent parameters (V and k). We have change the order according to suggestion of the reviewer. Note that this also involved changing the order of Table 2 and 3. Then present some basic results of the contact probabilities, to have an idea of the type of distributions that were used in the simulation study. It can be done by selecting a production type combination at a given distance and farm size. For this to be useful for the reader we believe we would have to use several examples and illustrate the differences graphically. Further, to be informative, this would need to be accompanied with information on the specific farms involved. The conditions of our use of the data supplied by the Swedish board of agriculture does not allow for this. Page 17 - simulation of disease transmission. The coefficients of this model represent the expected number of new infections, however in this section are described as probabilities. Having a lower expected number of new infections it doesn't mean a lower probability. It could be the case that a particular group has a large probability of generating few infections. More details about the Poisson model might help to clarify this. However the authors stated that the coefficients represent the expected number of new infections. The reviewer is indeed right that we have somewhat misused the word “probability” in this section and we have changed accordingly. Discussion I would suggest that, for consistency reasons, the discussion should follow the same structure as presented in MM and Results: 1) Parameter estimation: a) weight of production types, production type dependence, etc. and then contact probabilities. Largely, the present structure follows this structure with the exception that we discuss Q and h before v. We believe that this can be justified by the fact that v is not the main focus of this study and we want to start the discussion section with something that is. Alternatively we could have started with Q and h also in the MM and Result sections but it is somewhat difficult to introduce and explain Q and h without first presenting v. Hence, we have not rearranged the Discussion section exactly in the same way as MM and Result sections. Page 18 - Lines 16 - 17. Why is it mentioned that negative relationships are generally unexpected results? A better explanation of this parameter (c) might help to clarify its interpretation. For instance for a Farrow-toFinish premise with a large number of pigs the probability of incoming movements could be expected to be low since this production type is producing its own animals, so size is negative related to probability of incoming movements. The negative relationship with outgoing movements is clear for this particular situation since slaughter movements are not included in this study, and this situation was explained in the manuscript. So, in summary I think that negative relationships are expected results but depend on what type of production system these dependences are compared to. Maybe this need more explanation and clarification. Following the advice of the reviewer, we have largely rewritten this section, stressing the point that only without knowledge about the system, negative values may seem unexpected. We point out that indeed this result is sometimes expected, and exemplify this with the Farrow-to-finish production type. We also provide an interpretation of c in this section and stress that this study does not include shipments to slaughter houses. I would like to see in the discussion less text on the simulation model (whose results depend very much of the simulation used) and more discussion on the overall modelling approach and compared with other approaches that has been used to characterize movement patterns. One particular parameter used in other studies and being related to disease spread is the outdegree of farms. This parameter is different from parameters estimated in this study and can have an important role when identifying "super spreaders" holdings. A premise with many movements to one premise might have a less significant role as compared to holdings with less movement but to many premises. I think that some discussion about other methodologies and parameters will be more useful than interpreting results only applicable to one dataset and based on a particular simulation model. Following the suggestion of the reviewer, we have removed much of the content that discuss the simulation model (including removal of an entire paragraph) but left some of it that we believes help the reader in interpretation of the parameters. Also, we have added content to put our study in relation to other studies, in particular advantages and disadvantages compared to network analysis. Another main point for the discussion is the issue of the missing information, a big problem in many databases. Although here all those records with missing data are incorporated, it is not clear what would be the the final "utility" of that information and how the parameter estimations are affected by this "heterogeneous" group. Also it would be useful to have an idea of the relative importance of the factors evaluated in this study on the probability of contacts, which could indicate the amount of bias you might expect on your contact probabilities and also to justify the amount of efforts needed to collect this information. Regarding the Missing information, we have now clarified in the manuscript that the group denoted as such refers to missing information about the production type. Holdings without reported coordinates were in fact excluded, as we state in section 2.1, where we also address the data base entries on herd size. While this could be argued to be moved to the Discussion, we choose to include this in the MM section where the data is presented. We believe it helps the reader understand why we treated the data as we did. We very much agree with the reviewer that a sensitivity analysis of the importance of the parameters included would be very interesting. This would however require substantial extra content to the manuscript, which already is quite long. We do in fact have a manuscript in preparation that addresses this issue thoroughly. Page 14 - lines 4 - 11. It is no clear how the Poisson model was evaluated to be sure that the results met the assumptions of this model. In GLM, Poisson error distribution with and log link function is a fairly standard approach for analysis involving natural numbers. Since we here use this more to demonstrate trends, rather than hypothesis testing, we do not believe the manuscript would benefit from a deeper analysis. Again, the manuscript at present is quite long and “heavy” on the methodological part. Figures and Tables Avoid long titles, if needed use text as footnote (eg abbreviations). However if the abbreviations are described at the beginning of the manuscript (MM) might not need to explain them again in tables and figures. Use abbreviations in all tables and figures for production types. It is usually argued that a table or figure should be able to be presented on its own and hence abbreviation should not be left unexplained in the legend or table text. We will leave this up to the editor to decide and will adjust accordingly. We have however moved most of the table texts to a footnote under the tables. See also discussion on abbreviations above. Table 2. Title. Change "....incoming and outgoing..." "....outgoing and incoming..." to be consistent with the second paragraph and table. Fixed. Table 3. Title. Move the explanation of the different values to the correspondent section in the manuscript - a footnote can be added directing the reader to the section with the explanation of these values. Remove abbreviations from the title. See above