Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media When the Wikipedians Talk: Network and Tree Structure of Wikipedia Discussion Pages David Laniado‡,∗ Riccardo Tasso∗ Yana Volkovich‡ ∗ Politecnico di Milano Dipartimento di Elettronica e Informazione {david.laniado,tasso}@elet.polimi.it Andreas Kaltenbrunner‡ ‡ Barcelona Media – Innovation Centre Information, Technology and Society Group {firstname.lastname}@barcelonamedia.org implicit. Several studies have focused on the analysis of the content of talk pages (Viégas et al. 2007; Stvilia et al. 2008; Schneider, Passant, and Breslin 2010), while some researchers have studied the correlation between the presence of discussion and article quality (Kittur et al. 2007; Kittur and Kraut 2008). Kittur et al. (2007) also identify the number of edits done to a discussion page as the best indicator of conflict on the corresponding article. Though, little attention has so far been devoted to the study of interaction patterns emerging from discussions on these pages. We claim that the study of these interactions on a large scale can reveal essential features of the Wikipedia community and its social structure. In this paper we offer an extensive analysis of talk pages associated to articles and to users. We analyse the structural properties of the networks derived from interactions on these pages; in particular, the study of directed degree assortativity allows us to reveal specific patterns in the communications between the Wikipedians, which differ from the results obtained for the discussion board Slashdot. To characterize the discussions on article talk pages, we analyse their structure according to different measures, such as depth and size of the discussion threads. We show how chains of direct replies between pairs of users can be an interesting indicator of particularly contentious topics, and we report a listing of the most discussed articles, according to different criteria. Finally, we investigate the relationship between structural properties of the discussions and the corresponding semantic areas. Abstract Talk pages play a fundamental role in Wikipedia as the place for discussion and communication. In this work we use the comments on these pages to extract and study three networks, corresponding to different kinds of interactions. We find evidence of a specific assortativity profile which differentiates article discussions from personal conversations. An analysis of the tree structure of the article talk pages allows to capture patterns of interaction, and reveals structural differences among the discussions about articles from different semantic areas. Introduction Wikipedia is the largest example of collaboration on the Web, accessed and edited each day by thousands of people. Behind the most visible part of Wikipedia, i.e. the articles, there are non-encyclopedic pages which are used for coordination, discussion and personal communication among the Wikipedians. While the growth of the encyclopedia in terms of numbers of articles, edits and active users has slowed down in the last years, activity on these pages has kept increasing at a higher rate (Suh et al. 2009). In this study we focus on this less visible side of Wikipedia, in order to shed light on communication patterns that accompany collaboration on the project. Unlike other online discussions which often only satisfy the purpose of entertainment or of defending one’s point of view, the discussion on Wikipedia article talk pages has a clear objective, i.e. to reach consensus and improve the content of the corresponding article. In many cases these pages can considerably outgrow the corresponding article in size. For example, the talk page associated to the article ‘Barack Obama’ contains more than 22 000 comments, which is more than the 17 500 edits done to the article itself. In Wikipedia there are also talk pages associated to registered users; these pages are somehow complementary to the article discussion pages, and are used for personal communication between the Wikipedians, as a sort of public in-box. Communications in Wikipedia are part of a complex social system, where users are involved in the project to different extents and with different roles, either explicit or Experimental Setup From a technical point of view talk pages are simple wiki pages; however, their usage has evolved over the years to fit the community needs, and the wiki text syntax has been exploited to create a forum-like environment. The freedom which is left to editors in the usage of discussion pages was a challenge for our analysis. There is no structure surrounding a single comment nor an always valid schema to detect its start and end. Moreover, signing comments is left to users, who can use a shortcut to add the signature containing a link to their personal page at the end of a post; for anonymous (not registered) users the signature reports their IP number. Though there are bots in charge of automatically adding missing signatures, many comments are c 2011, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 177 #articles 3 210 039 #edits of article pages 402 851 686 #articles with talk page (ATP) 871 485 #total comments in ATP 11 041 246 #signed comments in ATP 9 421 976 #anonymous (ip signed) comments in ATP 1 000 824 #users who comment articles 350 958 #registered users 12 651 636 #user talk pages (UTP) 1 662 818 #comments in UTP 13 670 980 #signed comments in UTP 13 493 254 #anonymous (ip signed) comments in UTP 2 009 658 variable reply-NW talk-NW wall-NW #nodes with edges 204 017 114 258 1 861 702 w. in-degree ≥1 121 682 103 147 1 832 168 w. out-degree≥1 182 881 63 334 177 331 #edges M 1 489 734 852 065 4 412 212 size of giant comp. 88.5% 89.2% 96.3% mean distance 4.10 (0.75) 3.86 (0.69) 4.06 (0.68) maximal distance 15 11 12 Clustering coeff. 0.083 (0.19) 0.053 (0.16) 0.035 (0.14) mean in-degree 7.30 (29.6) 7.46 (32.8) 2.37 (15.75) mean out-degree 7.30 (35.2) 7.46 (41.5) 2.37 (103.79) network density 3.58 · 10−5 6.53 · 10−5 1.27 · 10−6 reciprocity 0.44 0.45 0.15 (27.1%) (85.3%) (9.1%) (2.8%) (13.1%) (98.7%) (14.7%) Table 1: Basic quantities of the data analysed. Table 2: Global measures of the Wikipedia discussion and talk network. Values within parenthesis indicate stdv. unsigned. To extract the thread structure with comment indentation, signatures and dates, we had to deal with many different explicit and implicit conventions, changing over years and not always attended by the users. Sometimes users reset indentation; we always consider these cases as the start of a new thread in the discussion. For this study we relied on a complete dump of the English Wikipedia dated March 12th, 2010. In Table 1 we report some basic quantities of the data extracted. B to user A if user B has written something on the talk page of user A. Basic network parameters In Table 2 we report some macroscopic features of the networks. Besides the dimension in terms of number of nodes, we report for each network the number of nodes having at least one outgoing or one in-going link, respectively. Interestingly, these quantities vary significantly from network to network. In the article discussions around 90% of users have replied to at least one user, while nearly 60% have received replies. On the contrary, in the other two networks almost all (wall: 98.4%, talk: 90.3%) users have at least one incoming link, while many do not have any outgoing link. In particular, only less than one over ten users in the wall network have written on another user’s talk page. This result is due to the presence of welcomers, users and bots who write a welcome message on the wall of newly registered users; this also explains the larger size of the wall network, which contains many users who are not active. For this reason also reciprocity is lower in the wall network. Wikipedia discussion networks There are no explicit networks between users in Wikipedia. In order to study the patterns of communication and discussion, we extracted three implicit directed networks according to different types of interactions between users: Article reply network (reply-NW) direct replies between users in article discussion pages. User talk network (talk-NW) direct replies in user talk pages. Wall network (wall-NW) personal messages posted on the talk page of another user. In all networks we discard anonymous users, as IP numbers are not reliable identifiers. In Figure 1 we schematically explain the idea of how these networks are constructed. In the article reply network (Figure ) we establish a directed edge from a user B to a user A if B has written at least one comment indented under an entry by user A in any article discussion page. The user talk network (Figure ) is analogously defined, but based on the comments in user talk pages, while the wall network establishes a link from user Discussion: Article 1 . . . . . . . . . . . . . . . user A(talk) ............ . . . user B(talk) ............ . . . user C(talk) ............ . . . user D(talk) ............ . . . user E(talk) Article 2 Discussion: ............ ............ ............ . . . user F(talk) Network comparison For the network comparison we modify the metric proposed by Szell, Lambiotte, and Thurner (2010), based on Jaccard coefficient of the link overlap, such that it takes values within [0, 1] and it is independent of the network densities. It measures the co-occurrence of links between users in different social networks. More formally, let G1 = (V, E1 ) and G2 = (V, E2 ) be two networks with the same set of nodes V , and with the sets of edges E1 and E2 , respectively. Then, article reply network Cjaccard = A B C D ............ ............ . . . user C(talk) (a) Article reply NW E F wall network User talk: User A ............ ............ . . . user C(talk) ............ ............ . . . user B(talk) ............ ............ . . . user A(talk) usertalk network B A A B C C |E1 ∩ E2 | max(|E1 |, |E2 |) · , |E1 ∪ E2 | min(|E1 |, |E2 |) where we denote as | · | the number of elements in the set. reply-NW talk-NW wall-NW (b) User talk and wall NWs Figure 1: Schema of the networks construction. reply-NW 1 0.11 0.09 talk-NW 0.11 1 0.35 wall-NW 0.09 0.35 1 Table 3: Jaccard coefficient between the networks. 178 type Slashdot (out, in) (in, out) (out, out) (in, in) Reply (out, in) (in, out) (out, out) (in, in) Talk (out, in) (in, out) (out, out) (in, in) Wall (out, in) (in, out) (out, out) (in, in) r -0.035 -0.016 -0.015 -0.027 -0.025 -0.018 -0.027 -0.015 -0.045 -0.025 -0.042 -0.028 -0.126 -0.039 -0.063 -0.061 rrand -0.046 -0.033 -0.038 -0.040 -0.019 -0.018 -0.018 -0.019 -0.030 -0.026 -0.028 -0.029 -0.087 -0.020 -0.043 -0.039 σrand 0.00059 0.00063 0.00063 0.00057 0.00063 0.00061 0.00062 0.00063 0.00998 0.00753 0.00848 0.00894 5.1e-5 0.00020 7.5e-5 0.00026 Z 17.677 26.613 35.843 24.143 -8.629 0.062 -14.179 6.385 -1.526 0.109 -1.753 0.076 -769.81 -93.51 -26.04 -84.21 ASP 0.329 0.495 0.667 0.449 -0.485 0.003 -0.797 0.359 -0.655 0.047 -0.753 0.033 -0.936 -0.114 -0.317 -0.102 board. We added these results to be able to compare discussions in Wikipedia with discussions from another large online community. The Slashdot reply network contains about 80 000 users and 1 million connections; for a detailed description of this dataset, see (Gómez, Kaltenbrunner, and López 2008). None of the assortativity values computed for the talk network is statistically significant (|Z > 2|). The wall network exhibits dissortativity according to all four measures, which points a general tendency of socially active users to interact preferentially with users having few connections. In particular, the remarkably high value observed for the (out, in)assortativity shall be imputed to the activity of users and bots who massively welcome new registered users writing on their personal talk page. The reply network extracted from Wikipedia articles shows to be (out, out)- and (out, in)- dissortative, with significant Z-scores, pointing out a marked tendency of users having many outgoing links to interact preferentially with users having few connections, and vice versa. On the contrary, (in, in) assortativity is positive, revealing a tendency of users to reply more often to others having a similar indegree. We do not observe this pattern in the Slashdot reply network, which is assortative according to all four measures1 . The difference could be due to the peculiar nature of Wikipedia article talk pages, where discussions are usually aimed at taking decisions about content production according to the community policies. While a high out-degree is the result of an active behaviour, replying to many users, a high in-degree is achieved getting many replies from different users. These two measures seem to capture two distinct characteristics of Wikipedia influential users, resonating with a distinction between hubs and authorities. The Wikipedians who reply to many other users in article talk pages tend to interact mostly with users having few connections, i.e. newbies and inexperienced users, while the Wikipedians who receive replies from many users tend to interact preferentially with each other. Table 4: Directed assortativity results for the three networks of Wikipedians and for the Slashdot reply network. Values in bold are significant (|Z > 2|). We compare the three networks and present the results in Table 3. In the wall network, we discarded all users who are not present in any of the two reply networks, to keep only active users. As it could be expected, the highest overlap is between the two networks extracted from user talk pages. Though, it is important to point out that these two networks capture different kinds of interaction, and none of the two is subsumed by the other. The overlap between edges in these networks of personal communications and the one extracted from articles is of about 10%, indicating substantially different networks. Directed assortativity by degree Assortativity by degree is a basic measure of diversity in networks, quantifying the tendency of nodes to link with other having similar number of edges (Newman 2002). This measure has been widely used to analyse various kinds of networks, and assortative mixing has been shown to be a characterising feature of social networks, with respect to technological and biological networks, that are mostly dissortative (Newman and Park 2003). On the contrary, recent studies have pointed out that many online social networks tend to dissortativity (Hu and Wang 2009). To compute degree assortativity accounting for the direction of edges, we rely on the Assortativity Significance Profile (ASP) proposed in (Foster et al. 2010). Combining the degree types (in- and out-) of the source and target nodes we obtain a set of four assortativity measures; we denote as r(out, in) the correlation between the out-degree of the source and the in-degree of the target node of each connection, and so on. For details about these metrics we refer to (Foster et al. 2010). To assess statistical significance of the results, we contrast each network with an ensemble of 100 randomly generated equivalents, having the same inand out-degree sequences. Results are reported in Table 4, together with the results for the reply network extracted from the Slashdot discussion The discussion trees In this section we focus on the shape and size of interactions in the discussion pages on Wikipedia. These interactions can be modelled in the form of discussion trees, where the root node corresponds to the article page on Wikipedia, and child nodes to comments or structural elements of the discussion pages. Unlike other online discussions, for example observed in blogs (Mishne and Glance 2006) or at Slashdot (Gómez, Kaltenbrunner, and López 2008), the Wikipedia discussion pages do not only consist of comments, which represent the actual interactions between the users, but may also contain many structural elements such as a separation of the total number of the comments into several sub-pages, or titles and subtitles to organize the content of the discussion. We model each of these different elements as a separate node in the discussion tree. The structure of the tree reflects 1 Note that this is different from what has been reported previously in (Gómez, Kaltenbrunner, and López 2008), where no comparison with randomised networks was taken into account. 179 522 44 6 521 size of Wikipedia discussions 91 #comments #users 116 42 54 559 519 552 503 41 501 53 89 582 114 21 111 112 113 122 88 77 110 87 94 160 130 167 172 174 180 181 215 192 162 166 171 173 177 179 191 213 183 201 206 212 190 20 14 11 6 592 5 8 75 8 0 68 65 6 46 3 5 75 5 5047 4 5 3 83 6 3 4 3 3 2 9 2 7 2 42 3 1 9 1 71 3 1 0 9 5 3 5 9 75 9 1 571 4 2 570 229 456 61 485 5 6525 4 455 5 4512 8 525 1 5 9589 5 5 8 85 8 4 5 7577 5 5 6596 7 5 6515 3 105 439 484 512 509 505 495 492 543 594583 5 75 67546 8 542 539 424 446 445 410 266 270 271 272 276 277 281 280 364 362 360 259 262 261 268 274 279 278 530 529 334 531 533 532 536 535 534 336 538 286 297 300 947 327 311 324 319 320 313 602 609 328 325 611 627 631 612 329 603 610 613 330 629 633 636 638 640 646 651 604 331 614 605 630 615 628 649 632 635 637 654 639 645 650 655 656 658 659 660 606 617 619 668 624 671 672 622 751 829 8 2 85 2 7 8 2 18 2 3 8 1 68 1 8 7 6 3 7 7 5 7 7 9 7 8 72 8 6 7 9 3 7 9 8 0 80 0 38 0 89 1 28 1 4 831 842 8 3 87 3 8 951 952 953 941 954 942 7 4 17 4 3 7 4 97 5 2 756 764 7 7 37 7 67 8 07 8 17 8 37 8 77 9 47 9 5 8 0 18 0 48 1 08 1 3 8 2 88 3 0 8 2 48 2 6 8 1 98 2 2 913 899 914 What are the shapes of these discussions? The example of Figure 2 suggest that we can basically identify two patterns: comments that are placed directly after a structural node (a headline etc.) and do not receive any replies; and large chain-like subthreads of comments, containing a sequence of replies between several users. Only occasionally a comment receives more than just one reply in these subthreads, which contain about 65.4% of all comments. The remaining 36.6% of the comments correspond to isolated unanswered comments who themselves are also not replying to another comment. To investigate those subthreads further we focus on the number of such chain-like subthreads that can be found per discussions and on their lengths. To formalise the concept of chains we consider only subthreads where exactly two users interact subsequently. We define as n-chains (or simply chains, if not stated otherwise) all sequences which include at least three comments. So the shortest n-chains are of the form A ← B ← A where A and B are two different users and the arrows indicate a reply of B to A and a backreply from A to B. These chains can grow considerably. The longest chain in our dataset is of length 31 in the discussion page of “Central Bosnia Canton”2 . Figure 3 (right) shows the number of chains of different lengths. Given a chain of length k, we do not count any sub-part of it as a chain. We find that 22.7% of all comments form part of chains of length of at least 3 (n-chains), while 40.7% belong only to 2-chains (are either parent or reply but not part of an n-chain). The remaining comments are isolated (1-chains). The distribution of the number of n-chains with different lengths follows roughly a power-law with exponent 6 (red line in Figure 3). In Figure 4 we show the distribution of the number of nchains in the discussion pages. Again we find a heavy tailed distribution, with some discussions containing several thousand chains. The distribution can be fitted with both, a power law distribution (with cut-off) with exponent 2.23 and a truncated log-normal distribution. Both fits are not rejected by a Kolmogorov Smirnov test (see figure legend for the corresponding p-values). The number of chains gives us an idea about how many times a controversy arises in the article discussion. In Table 5 901 900 896 905 904 903 902 891 882 851 849 846 845 732 931 915 906 897 916 876 875 832 884 883 862 853 852 850 917 714 717 719 723 693 840 724 733 734 687 834 833 7 4 47 5 0 753 757 7 6 57 7 47 7 7 796 918 885 863 854 820 7 8 47 8 8 8 0 58 1 1 919 694 841 720 886 835 725 735 688 745 7 5 47 5 8 864 855 7 6 07 6 6 778 789 920 806 695 721 726 727 736 746 856 755759 696 761767 790 807 722 728 857 737 747 697 762 768 791 707 729 738 698 769 730 699 30 895 890 887 874 873 870 867 861 686 706 10 20 chain length 928 926 924 881 844 839 718 692 623 673 0 879 878 872 869 866 860 848 7 3 17 4 0 7 4 27 4 8 708 709 710 711 712 713 715 716 685 678 880 843 836 817 808 815 7 8 57 9 27 9779890 2 689 691 677 666 894 892 889 847 661 669 674 676 679 681 683 684 690 670 644 868 8 5896 5 675 680 682 663 667 664 665 643 625 626 898 871 642 648 653 618 620 969 949 948 930 929 927 925 923 922 912 911 910 909 907 877 657 641 647 652 5 921 893 888 599 662 616 621 908 634 321 332 946 945 944 943 940 939 938 936 934 933 932 858 0 935 333 601 608 967 959 958 957 956 950 937 600 607 326 974 968 955 304 322 323 984 982 983 973 972 966 963 965 962 961 316 317 312 987 980 979 976 971 964 960 314 315 4 Figure 3: Distributions of the number of (left) comments and users per discussion page, and (right) discussion chains of different lengths in the dataset. Percentages indicate the proportion of comments in the different types of chains. 344 343 985 978 975 305 307 318 3 353 342 986 977 306 308 2 351 341 981 309 0 1 10 10 10 10 10 number of users, comments 10 352 350 301 302 303 299 310 1 10 354 359 340 970 293 295 2 10 367 349 989 988 289 3 10 366 358 335 4 10 1−chains (36.6%) 2−chains (40.7%) n−chains (22.7%) −6 PL−fit: y~x 372 371 339 338 337 267 290 291 292 294 296 288 284 374 348 537 5 10 373 273 285 383 347 346 254 384 382 357 356 355 345 265 269 275 298 368 363 361 287 282 283 379 376 386 378 385 393 392 391 389 381 398 396 390 388 380 377 375 370 369 365 395 387 394 223 401 405 399 217 220 226 231 250 255 258 260 403 402 408 407 406 400 404 397 242 263 264 415 413 411 232 243 248 251 256 0 409 434 427 425 233 2 10 10 0 10 417 416 414 412 10 10 418 420 419 430 429 432 6 4 421 437 436 435 428 426 422 438 444 443 442 433 448 459 457 447 431 244 249 252 257 454 460 458 463 196 423 467 476 473 489 488 466 540 483 5 2572 4 462 482 523 511 475 453 508 504 472 452 494 471 491 470 490 465 487 461 481 480 451 474 450 464 449 493 197 245 253 440 468 477 544 95 103 202 227 228 234 238 246 441 478 7 216 218 221 222 224 225 230 235 237 239 241 247 469 497 506 496 154 207 208 240 5 6 53 5 5 5 4 58 4 7 546 545 513 5 9569508 9 5 8557 8 170 188 219 236 486 507 510 187 189 194 195 198 203 199 204 209 498 526 2 8 2 6 2 2 1 81 6 1 2 8 178 205 210 516 515 5 6 45 5 6 549 514 5 8 56 7 9 200 211 572 70 74 8 07 8 84 92 128 134 135 138 141 144 146 152 127 133 155 137 161 140 163 143 145 164 151 165 184 214 7 57 1 79 59 6 26 0 32 37 35 49 46 58 56 118 147 153 156 81 85 106 136 139 142 157 158 168 175 25 66 82 93 96 97 100 104 119 129 148 176 86 69 479 107 149 159 182 5 6 55 5 7 550 72 98 101 120 517 573 499 76 109 150 185 581 31 30 39 5148 108 186 593 73 83 99 102 121 131 193 500 15 40 52 67 123 132 169 518 566 558 551 124 125 126 number of articles 502 90 115 distribution of chain lengths 10 520 560 7 10 43 number of chains 117 739 770 700 771 701 702 703 772 704 705 Figure 2: The structure of the discussion page of “Presidency of Barack Obama” (generated with Graphviz). Blue nodes are structural, green nodes are unsigned comments. the hierarchy of the pages. A reply to a comment is a child node of this comment and comments which are placed below a title or a new page are child nodes of the corresponding structural node unless they reply to another comment. Note that there can be several nested levels of structural nodes as there can be several levels of titles and subtitles. To help in the comprehension of the following analysis we show in Figure 2 one of these trees. It corresponds to the Wikipedia article “Presidency of Barack Obama” (represented by the red node in the centre of the radial tree) and contains 989 nodes of which 254 are structural nodes (in blue in Figure 2) and the rest comments. Note that this article is different from one just on “Barack Obama” cited earlier. The size of the discussions Out of the approx. 3.2 million articles in our dataset nearly 870 000 have an associated discussion page (about 27%), which contain more than 9.4 million signed comments, created by more than 350 000 users (See Table 1 for details). As one would expect the distribution of the number of comments and users among the different articles follows heavy tailed distributions as shown in Figure 3 (left). Although more than 85% of all articles have discussions with only 10 or less comments, there is still a considerable number of articles (approx. 15 000) with more than 100 comments and 826 discussions even contain more than 1000 comments. The largest discussions reach more than 30 000 comments involving several thousand users. 2 See http://en.wikipedia.org/wiki/Talk: Central_Bosnia_Canton 180 number of chains per article The deepest discussion can be found about the article “Liberal democracy”3 . It reaches a depth of 42, while its hindex is only 12. The maximal h-index is observed for “Anarchism” (h-index = 20). In the Columns 6 and 7 of Table 5 we present the depth and h-index of the 20 most discussed articles. From the rankvalues within parenthesis we can conclude that the rank by h-index is closer to the rank by number of chains than the rank of the maximal depth of these discussions. The maximal depth is very sensitive to the presence of isolated individual discussions between a small number of users, which can reach considerable depths while not being representative for the entire discussion. The h-index overcomes this limitation and we will use it therefore in the next section to account for the depth of the discussions. 0 10 10 #articles 3 10 2 10 1 −1 10 ccdf of #articles data truncated LN: μ=−6.5, σ=3.0 p−value = 0.997 4 10 −2 10 −3 10 PL: α = 2.23 p−value = 0.087 cut−off: x = 21 −4 10 min 95.75% discarded −5 0 10 0 10 1 10 2 10 3 10 #chains 10 0 1 10 10 2 3 10 #chains 10 Figure 4: Number of discussion chains per discussion page. all discussions 5 x 10 4 4 2500 1500 2 1000 3 5 10 6000 4000 Comparison with Categories 2000 1000 1 5 Only discussions >100 nodes 3000 2 500 0 x 10 5000 2000 #articles #articles 3 all discussions 5 5 Only discussions >100 nodes 10 15 20 25 30 35 40 15 20 25 max depth 30 35 40 1 0 5 5 10 15 10 15 h−index(structure) In this section we investigate whether the structure of the article discussions differs for the different topic categories of Wikipedia articles. 20 20 Assigning articles to macrocategories Assigning Wikipedia articles to a set of topics is not a trivial task, as each article is usually assigned only to low level categories, which can in turn be associated to many super-categories. Links to categories and super-categories are managed by users inside the wiki text, so the result is a rich but inconsistent hierarchical structure made of more than 500 000 categories, in which one can find several loops. To deal with this disordered semantic information we relied on the approach proposed by Kittur, Chi, and Suh (2009) to classify articles according to a limited set of top level categories, or macrocategories. The algorithm starts by considering, for each article, all the categories to which it has been directly assigned. Each of these labels is in turn assigned to the closest macrocategory in the category graph. The extent to which an article belongs to each macro-category is computed as a weight, quantifying the proportion of directly assigned categories which belong to that macrocategory. In the case of equally short paths from a label to multiple macrocategories, the contribution of this category is split among the equidistant closest macrocategories. Kittur, Chi, and Suh computed the article assignments in 2009 based on 11 macrocategories; we ran the same algorithm with 21 macrocategories shown in Figure 6, corresponding (with minor arrangements) to the current official Wikipedia top level categories. Though the category graph is based on directed relationships linking categories to super-categories, Kittur et al. considered it as an undirected graph to compute the shortest paths between each category and the macrocategories. This allows to assign all categories to some macrocategories, while considering the directed graph many categories would remain disconnected. However, our intuition is that link direction in the hierarchy matters. So we corrected the algorithm assigning a higher weight to the edges followed in the wrong direction. Penalising these edges by a factor of 3 brought to a significant Figure 5: Distribution of the max. depth (left) and the hindex (right) of the article discussion pages. we list the top 20 articles according to this measure and compare it with the total number of comments, registered users who comment and edits of the corresponding article page. The numbers in parenthesis indicate the rank number of the article when ordering by the corresponding variable. Note that nearly in all cases the number of comments is much larger than the number of actual edits on the corresponding article page. Most of the topics in the list represent also highly disputed subjects in real life, either due to political, ideological, religious or scientific disputes. They seem to be a good barometer of contentious discussion topics in the last few years. We furthermore list as well the depth of these discussion trees which we will treat separately in the next subsection. Depths of the discussions We study the depth of the discussions using two measures: its maximal depth (i.e., the level of the deepest comment in the discussion tree) and its h-index, a balanced depth measure which was introduced in (Gómez, Kaltenbrunner, and López 2008) on the base of the h-index of (Hirsch 2005). To calculate the h-index of the tree structure we count the number of nodes per level, starting at level one (the root node) and descending the tree. The h-index of the tree is then the maximum level, for which the corresponding number of nodes is greater or equal to the level number (and all previous levels fulfil the same condition). Note that we also consider structural nodes in these calculations. In Figure 5 we show the distributions of the maximal depths and the h-index for all discussion and only for the ones with more than 100 nodes (in the insets). We observe that the two distributions have a similar shape but slightly different modes. 3 See http://en.wikipedia.org/wiki/Talk: Liberal_democracy/Archive_2 181 # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Title Intelligent design Gaza War Barack Obama Sarah Palin Global warming Main Page Chiropractic Race and intelligence Anarchism British Isles Climatic Research Unit hacking incident Jesus Circumcision Homeopathy George W. Bush September 11 attacks Evolution Catholic Church Cold fusion 2008 South Ossetia war chains 2413 2358 2301 2182 2178 2065 1772 1764 1589 1556 1551 1397 1356 1323 1281 1250 1165 1162 1098 1075 comments 22454 (3) 17961 (6) 22756 (2) 19634 (4) 19138 (5) 32664 (1) 13684 (13) 13790 (12) 14385 (9) 12044 (16) 11536 (17) 17916 (7) 10469 (21) 13509 (14) 15257 (8) 13830 (11) 13404 (15) 14104 (10) 8354 (29) 10596 (20) users 954 (13) 607 (47) 2360 (2) 1221 (9) 1382 (5) 5969 (1) 243 (389) 410 (126) 496 (76) 576 (56) 474 (88) 1239 (7) 436 (113) 516 (68) 1969 (3) 1244 (6) 942 (16) 620 (43) 359 (174) 853 (20) h-index 16 (20) 19 (2) 18 (6) 17 (10) 17 (10) 15 (34) 18 (6) 17 (10) 20 (1) 17 (10) 17 (10) 13 (119) 17 (10) 17 (10) 14 (65) 16 (20) 13 (119) 15 (34) 15 (34) 17 (10) max. depth 20 (358) 27 (28) 21 (245) 25 (56) 20 (358) 22 (169) 29 (17) 24 (74) 28 (22) 23 (113) 20 (358) 16 (1383) 26 (42) 25 (56) 18 (676) 26 (42) 23 (113) 18 (676) 20 (358) 23 (113) edits 9179 (53) 11499 (29) 17453 (6) 12093 (24) 14074 (15) 4003 (674) 6190 (204) 7615 (100) 12589 (19) 4047 (658) 2346 (2364) 17081 (7) 7354 (117) 6902 (151) 32314 (1) 11086 (30) 9780 (44) 14082 (14) 4320 (557) 9930 (43) Table 5: Several structural measures of the top 20 Wikipedia discussions ordered by the number of n-chains (length ≥ 3). % of articles with discussions in different categories improvement in the performance, evaluated over a random sample of 300 manually assigned articles (Farina, Tasso, and Laniado 2010). s Art g utin mp hy o C sop ilo h Ph ealt H lief Be ic t s ma e g the Ma ngua La nce ie Sc ss e sin Bu ent m on vir Law En n atio uc i. Ed . sc pp s a ic & c. Polit Te y t cie So ts or Sp e r ltu u ric ple g A eo P re u lt Cu ts n ve s &e ry place o t 0 His r. & og e G Structural differences between the categories We investigate the proportion of pages with discussions among the different categories. We use the category weights for this calculation. So, if an article has a 60% weight in category A and a 40% weight in category B it contributes with the corresponding proportions to these two categories. The black bars in Figure 6 show these general proportions of articles with discussions (the corresponding %-value is written on the right y-axis). We observe a large heterogeneity among the different categories. “Geography and Places” and “History and events” are nearly of the same size and account together for more than 46% of all discussion pages. The next two categories “Culture” and “People” account for another 20%. Interestingly, if we restrict this analysis to only the top 1% or 0.1% of the article discussion pages (according to their number of comments) we observe rather different distributions (indicated by the grey and white bars in Figure 6). The “Geography and Places” proportion decays to a only half of its original value, while some other categories like “Belief”, “Society”, “Philosophy” or “Law” and “Politics” approximately double their share. This change seems to indicate that these categories, although less frequent among the entire set of articles with discussions, attract more than an average number of comments. To investigate this further we calculated for every category (using the category weights of every article) the weighted average of several structural metrics presented in the previous subsections. The outcome of this analysis is presented in the form of two cross-plots in Figure 7. To verify the results we performed a bootstrap test (N = 1000) and depict the 95% interval of the observed average value with grey areas. In the cases where the area is absent the symbol size is larger all articles with discussion pages top 1% with most comments top 0.1% with most comments 0.6% 0.5% 1.0% 1.2% 2.0% 0.7% 2.0% 1.9% 1.6% 1.2% 1.0% 3.1% 3.2% 2.4% 4.8% 2.6% 3.4% 8.1% 12.1% 23.4% 23.3% 2 4 6 8 10 12 14 16 % of articles with discussion 18 20 22 24 Figure 6: Proportion of articles within different categories for all discussion pages than the corresponding confidence interval. We can extract several interesting conclusions from the two cross-plots. From the right cross-plot we see a clear correlation between the average values of the depth of the discussion measured with the h-index and the number of users who left at least one comment in the discussion. As we discussed above the category “Geography and Places” is on average the one with the flattest discussions. On the other hand, the top categories according to these measures are “Philosophy”, “Law”, “Language” and “Belief”. These categories trigger, on average, the deepest discussions involving the largest amount of users. They are also the top 3 categories if we use the number of discussion chains as a measure as can be seen from the left sub-figure of Figure 7 where we compare the number of edits with the number of discussion 182 # chains in discussion # users in discussion Technology and applied sciences 180 2.3 Geography and places History and events 170 Culture 2.2 People 160 # edits in article Sports 150 Society Politics 140 Education Law 130 Environment Business 120 Science Language 110 2.1 2 1.9 1.8 Mathematics Belief 100 90 0 h−index of discussion Agriculture 1.7 Health Philosophy 0.5 1.5 1 2 # chains in discussion 2.5 3 Computing Arts 1.6 3 4 6 5 7 # users in discussion 8 9 Figure 7: Differences between Categories: (left) the average number of chains in the article discussion pages vs. the average number of article edits, (right) the average number of users versus the h-index of the tree structure. Gray areas indicate (when larger than the corresponding symbol size) the 95% confidence interval. chains. This seems to agree with (Kittur, Chi, and Suh 2009), where the amount of conflict in different macrocategories is estimated according to a page-level heuristics, and “Belief” and “Philosophy” are identified as the most contentious categories. While the authors find the lowest level of conflict in category “Mathematics”, from Figure 7 we can observe that, although this category has the lowest average number of edits per article, it still reaches a considerable amount of debate in the form of discussion chains. Some article categories like “People”, “Arts” or even “Sports” are on average less discussed than their number of edits would suggest. In the right subplot we also observe two outlayers to the otherwise quite correlated averages. The categories “Computing” and “Mathematics” obey a different behaviour. Discussions on “Computing” articles involve more users, than their average h-index would suggest, while articles of the “Mathematics” category have the opposite behaviour. They are deeper but involve less users than expected. To summarise, we observe quite different relations between the discussion structures and the number of edits among the different categories. We have also found that the size and shape of the discussion varies significantly among the different categories. A more detailed analysis involving the study of individual user activities in the different categories might shed further light on whether these differences are community or content based. served as effect of discussion only on small and “young” pages. Presence of discussion is just measured in terms of size of the article talk pages, and patterns of communications are not considered. Few researchers focused on Wikipedia talk pages to study social relationships between users. Crandall et al. (2008) investigate the interplay between social ties, modelled as interactions in discussion pages, and similarity, modelled as editing activity on the same articles. They find evidence of a feedback effect between the two phenomenons. Only a 15% overlap is found between the graph of social interactions and the one of similarity; interestingly, properties of the social network reveal to be better predictors of future behaviour than properties of the similarity network. A qualitative description of different patterns corresponding to different social roles in Wikipedia is offered in (Gleave et al. 2009), based on the local network of personal communications around single users. Tang, Biuk-Aghai, and Fong (2008) propose a model for the representation of weighted co-authorship relationships, while in (Laniado and Tasso 2011) the main editors of each article are selected as authors in order to build a collaboration network over the whole Wikipedia. More detailed models have been proposed to represent different kinds of interactions in editing activity (Brandes et al. 2009). Both the approaches of extracting co-authorship networks and edit networks are complementary to ours and could be integrated to contrast direct replies in discussion pages with relationships emerging from editing activity. To the best of our knowledge, in this paper we propose the first extensive study of Wikipedia as a discussion space. Similar analysis have been performed for blogs (Mishne and Glance 2006) and online discussion boards (Gómez, Kaltenbrunner, and López 2008). A generative model for the Related Studies Kittur et al. (2007) describe the growth of the hidden side of Wikipedia, comprehending talk pages and all Wikipediaspecific pages deputed to conflict and coordination. A longitudinal study to investigate the role of coordination in the improvement of Wikipedia articles’ quality is described in (Kittur and Kraut 2008); positive improvements are ob- 183 structure of the discussion threads analysed here has been presented in (Gómez, Kappen, and Kaltenbrunner 2011). The model parameters show important structural differences between the discussions in Wikipedia and those of other social media platforms. Farina, J.; Tasso, R.; and Laniado, D. 2010. Automatically assigning Wikipedia articles to macrocategories. http://airwiki.ws.dei.polimi.it/index.php/WCG. Foster, J. G.; Foster, D. V.; Grassberger, P.; and Paczuski, M. 2010. Edge direction and the structure of networks. PNAS 107(24):10815–10820. Gleave, E.; Welser, H. T.; Lento, T. M.; and Smith, M. A. 2009. A conceptual and operational definition of ’social role’ in online community. In Proc. of HICSS 2009. Gómez, V.; Kaltenbrunner, A.; and López, V. 2008. Statistical analysis of the social network and discussion threads in Slashdot. In Proc. of WWW 2008. Gómez, V.; Kappen, H. J.; and Kaltenbrunner, A. 2011. Modeling the structure and evolution of discussion cascades. In Proc. of Hypertext 2011. Hirsch, J. E. 2005. An index to quantify an individual’s scientific research output. PNAS 102(46):16569–16572. Hu, H.-B., and Wang, X.-F. 2009. Disassortative mixing in online social networks. Europhysics Letters 86(1):18003. Kittur, A., and Kraut, R. E. 2008. Harnessing the wisdom of crowds in Wikipedia: quality through coordination. In Proc. of CSCW 2008. Kittur, A.; Suh, B.; Pendleton, B.; and Chi, E. 2007. He says, she says: Conflict and coordination in Wikipedia. In Proc. of SIGCHI 2007. Kittur, A.; Chi, E. H.; and Suh, B. 2009. What’s in Wikipedia?: mapping topics and conflict using socially annotated category structure. In Proc. of CHI 2009. Laniado, D., and Tasso, R. 2011. Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proc. of Hypertext 2011. Mishne, G., and Glance, N. 2006. Leave a reply: An analysis of weblog comments. In Proc. of WWE 2006. Newman, M., and Park, J. 2003. Why social networks are different from other types of networks. Physical Review E 68(3):36122. Newman, M. E. J. 2002. Assortative mixing in networks. Physical Review Letters 89(20):208701. Schneider, J.; Passant, A.; and Breslin, J. 2010. A qualitative and quantitative analysis of how Wikipedia talk pages are used. Proc. of the WebSci 2010. Stvilia, B.; Twidale, M. B.; Smith, L. C.; and Gasser, L. 2008. Information quality work organization in Wikipedia. JASIST 59(6):983–1001. Suh, B.; Convertino, G.; Chi, E. H.; and Pirolli, P. 2009. The singularity is not near: slowing growth of Wikipedia. In Proc. of WikiSym 2009. Szell, M.; Lambiotte, R.; and Thurner, S. 2010. Multirelational organization of large-scale social networks in an online world. PNAS 107(31):13636–13641. Tang, L. V.-S.; Biuk-Aghai, R. P.; and Fong, S. 2008. A method for measuring co-authorship relationships in mediawiki. Proc. of WikiSym 2008. Viégas, F.; Wattenberg, M.; Kriss, J.; and van Ham, F. 2007. Talk Before You Type: Coordination in Wikipedia. In Proc. of HICSS 2007. Conclusions In this paper we have focused on Wikipedia talk pages to detect structural patterns of interaction which accompany collaboration on the project. The study of directed assortativity reveals the existence of a characterizing pattern in the reply network extracted from article discussion pages. Users who reply to many other users tend to reply preferentially to inexperienced users, while the Wikipedians who receive comments by many users are more likely to interact with each other. This pattern is not observed in the Slashdot reply network neither in personal conversations in Wikipedia. We suggest that it derives from the nature of discussion on article talk pages, focused on solving issues and controversies according to codified community policies, and reflects the existence of different social roles among the more influential users. The study of shape and size of the discussions at the article level reveals interesting patterns and suggests some metrics to characterize different talk pages. The number of chains of direct replies between pairs of users seems to be a good indicator of contentious discussion topics, while hindex of the tree is a compact measure to capture the actual depth of a discussion. We found evidence of significant differences in discussions from different semantic areas. For example, discussions about Mathematics tend to reach a much higher depth than the number of users involved and of edits in the corresponding articles would suggest. This work proposes a first insight into Wikipedia as a space of discussion and offers many directions for improvement and for future investigation. The comparison of users’ behaviour in the different networks (and maybe also in networks derived from interactions in article editing) could help in the identification of social roles. A more fine grained analysis involving the time-stamps of the comments may allow for a better understanding of social dynamics on a temporal dimension, and to detect contentious topics during a certain interval of time. Acknowledgements Yana Volkovich acknowledges support from the Torres Quevedo Program from the Spanish Ministry of Science and Innovation, co-funded by the European Social Fund. References Brandes, U.; Kenis, P.; Lerner, J.; and van Raaij, D. 2009. Network analysis of collaboration structure in Wikipedia. In Proc. of WWW 2009. Crandall, D. J.; Cosley, D.; Huttenlocher, D. P.; Kleinberg, J. M.; and Suri, S. 2008. Feedback effects between similarity and social influence in online communities. In Proc. of KDD 2008. 184