Data and networks GIACS Conference Palermo 9-4-08 •Networks GIACS PALERMO 9-4-08 •Networks as an instrument of Data Filtering Correlation based Minimal Spanning Tree 1071 stocks traded at NYSE between 1987-1998 Different colours refers to different SIC sectors Correlation based Minimal Spanning Tree Artificial market of 1071 stocks According the one factor model. Different colours refers to different SIC sectors Topology of correlation based minimal spanning trees in real and model markets G. Bonanno, G. Caldarelli F. Lillo, R. Mantegna, Networks of equities in financial markets Physical Review E 68 046130 (2003). G. Bonanno, GC, F. Lillo, S. Miccichè, N. Vandewalle, R. N. Mantegna, European Physical Journal B 38 363-372 (2004). GIACS PALERMO 9-4-08 •The Cosin project COSIN (official number IST-20001-33555) was a Research Project financed by European Commission through the Fifth Framework Programme. COSIN is part of the actions taken by the Future and Emerging Technologies (FET) in the priority area of research of Information Society Technologies (IST) (http://www.cordis.lu/IST/FET) Documents at http://www.cosinproject.org GIACS PALERMO 9-4-08 •The Cosin project COSIN involves 7 different nodes in 5 countries A. B. C. D. E. F. (Ph +CS) Roma, Italy (Ph) Barcelona, Spain (Ph) Lausanne, Switzerland (Ph) Ens, Paris, France (CS) Karlsruhe, Germany (Ph) Upsud, Paris, France EU countries 2001 Non EU countries 2001 EU COSIN participant Non EU COSIN participant GIACS PALERMO 9-4-08 •Some of the Cosin people G. Bonanno, G. Caldarelli, F.Colaiori, G. Di Battista, D. Donato, S. Leonardi, R. Mantegna, A. Marchetti-Spaccamela, M. Patrignani, L. Pietronero, V. Servedio A. Arenas, M. Boguña, A. Díaz-Guilera, R. Ferrer i Cancho, M.A. Muñoz, M.A Serrano, R. Pastor-Satorras G. Bianconi, A. Capocci, P. De Los Rios, T. Erlebach, T. Petermann, Y.-C. Zhang A. Barrat. S. Battiston, P. Nadal, A. Vespignani, G. Weisbuch, U. Brandes, M. Gaertler, M. Kaufmann, D. Wagner, GIACS PALERMO 9-4-08 •The Cosin project 1. To develop a unified set of Complex Systems theoretical methodologies for the characterization of Complex Networks, 2. To develop statistical models for networks growth and evolution. 3. To collect data mainly for Internet and World Wide Web 4. To extend analysis to social and economic networks 5. To develop visualization tools for large scale systems 6. To disseminate results through publication, conferences and project web site. GIACS PALERMO 9-4-08 •A Cosin summary 1. After three years of activity we have a common ground of methodologies and tools at least between computer scientists and physicists (also some economists). Some more effort would be necessary to integrate social scientists. 2. We provided a class of models for network growth and evolution, moreover we addressed the study of statistical properties of weighted networks. 3. Data collection for Internet and World Wide Web resulted much more difficult than expected. Actually larger consortia have been funded specifically for this task in the meanwhile. Thank to external collaboration we still found the data to validate the models we produced GIACS PALERMO 9-4-08 4. In economic and financial networks , COSIN people are on the frontline of this very new field of research. This new approach attracted the interest of the community at level of Nobel laureates. Less successful has been the impact in social science. Unexpected and very successful has been the impact on biology (botany, zoology). 5. Standard visualization problem wants to keep all the graph structure and present it suitably. On this point some progress has been made, it is worth to mention that several ideas are now under consideration for the visualization of ``simplified graphs’’. 6. The project had a considerable impact on the scientific community in terms of citations, visibility, conferences, schools, books and data download from site. Maybe some more work could be done for the general public. GIACS PALERMO 9-4-08 The graph of scientific collaborations on scale-free networks in statistical physics M.E.J Newman PRE 69 026113 (2004) GIACS PALERMO 9-4-08 •Dissemination • More than 150 referred papers (some of them Nature, PNAS, PRL, LNCS) • Lectures and talks in the various world conference (for physics STATPHYS, APS Meetings) and invited talks in various institutions • Books GIACS PALERMO 9-4-08 The Sitges Conference published the proceedings of the most interesting talks on a special volume Statistical Mechanics of Complex Networks Series: Lecture Notes in Physics, Vol. 625 Pastor-Satorras, Romualdo; Rubi, Miguel; Diaz-Guilera, Albert (Eds.) 2003, XII, 206 p., Hardcover ISBN: 3-540-40372-8 The Rome Conference published the proceeding on a special issue of the European Physical Journal B GIACS PALERMO 9-4-08 •Web site GIACS PALERMO 9-4-08 •What about data? Trivially, the access to data was crucial for the project We had that in some cases we found very nice datasets and could work on them 1. Internet (AS topology) 2. Wikipedia. In presence of poor or no data, we obtained (of course) only partial results 1. Liquidity shocks, 2. River networks GIACS PALERMO 9-4-08 STATISTICAL PROPERTIES OF THE WIKIGRAPH L.S. Buriol A. Capocci, F. Colaiori, D. Donato, S. Leonardi, F. Rao, V. Servedio, GC 1. 2. Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia A.Capocci, F. Rao, GC Europhysics Letters 81 28008 (arXiv:0710.3058) (2008) Preferential attachment in the growth of social networks: the Internet encyclopedia Wikipedia A. Capocci, V.D.P. Servedio, F. Colaiori, L.S. Buriol, D. Donato, S. Leonardi, GC Physical Review E 74 036116 (2006). Centro “E. GIACS Fermi” PALERMO 9-4-08 •Wikipedia intro GIACS PALERMO 9-4-08 •Wikipedia intro Wikipedia in other languages You may read and edit articles in many different languages: Wikipedia encyclopedia languages with over 100,000 articles Deutsch (German) · Français (French) · Italiano (Italian) · (Japanese) · Nederlands (Dutch) · Polski (Polish) · Português (Portuguese) · Svenska (Swedish) Wikipedia encyclopedia languages with over 10,000 articles ( العربيةArabic) · Български (Bulgarian) · Català (Catalan) · Česky (Czech) · Dansk (Danish) · Eesti (Estonian) · Español (Spanish) · Esperanto · Galego (Galician) · ( עבריתHebrew) · Hrvatski (Croatian) · Ido · Bahasa Indonesia (Indonesian) · 한국어 (Korean) · Lietuvių (Lithuanian) · Magyar (Hungarian) · Bahasa Melayu (Malay) · Norsk bokmål (Norwegian) · Norsk nynorsk (Norwegian) · Română (Romanian) · Русский (Russian) · Slovenčina (Slovak) · Slovenščina (Slovenian) · Српски (Serbian) · Suomi (Finnish) · Türkçe (Turkish) · Українська (Ukrainian) · 中文 (Chinese) Wikipedia encyclopedia languages with over 1,000 articles Alemannisch (Alemannic) · Afrikaans · Aragonés (Aragonese) · Asturianu (Asturian) · Azərbaycan (Azerbaijani) · Bân-lâm-gú (Min Nan) · Беларуская (Belarusian) · Bosanski (Bosnian) · Brezhoneg (Breton) · Чăваш чěлхи (Chuvash) · Corsu (Corsican) · Cymraeg (Welsh) · Ελληνικά (Greek) · Euskara (Basque) · ( فارسیPersian) · Føroyskt (Faroese) · Frysk (Western Frisian) · Gaeilge (Irish) · Gàidhlig (Scots Gaelic) · हिन्दी (Hindi) · Interlingua · Íslenska (Icelandic) · Basa Jawa (Javanese) · ქართული (Georgian) · ಕನ್ನಡ (Kannada) · Kurdî / ( كوردیKurdish) · Latina (Latin) · Latviešu (Latvian) · Lëtzebuergesch (Luxembourgish) · Limburgs (Limburgish) · Македонски (Macedonian) · मराठी (Marathi) · Napulitana (Neapolitan) · Occitan · Ирон (Ossetic) · Plattdüütsch (Low Saxon) · Scots · Sicilianu (Sicilian) · Simple English · Shqip (Albanian) · Sinugboanon (Cebuano) · Srpskohrvatski/Српскохрватски (Serbo–Croatian) · தமிழ் (Tamil) · Tagalog · ภาษาไทย (Thai) · Tatarça (Tatar) · తెలుగు (Telugu) · Tiếng Việt (Vietnamese) · Walon (Walloon) Complete list · Multilingual coordination · Start a Wikipedia in another language GIACS PALERMO 9-4-08 •Wikipedia intro The datasets of each language are available in two selfextracting files for mysql database. The table cur contains the current online articles, whereas the table old contains all previous versions of each current article. Old versions of an article are identified for using the same title, and not the same id. The dataset dumps are updated almost weekly, so the current graph is usually not more than a week old. For generating a graph from the link structure of a dataset, each article is considered a node and each hyperlink between articles is a link in this graph. In the wikipedia datasets, each webpage is a single article. An article also might contain some external links that point pages outside the dataset. Usually wikipedia articles has no external links, or just a few of them. These kind of links are not considered for generating the wikigraphs, since we want to restrict the graph to pages into the set being analyzed. GIACS PALERMO 9-4-08 •Wikipedia interests • sociological reasons: the encyclopedia collects pages written by a number of indipendent and eterogeneous individuals. Each of them autonomously decides about the content of the articles with the only constraint of a prefixed layout. The autonomy is a common feature of the content creation in the Web. The wikipedia authors’ community is formed by members whose only wish is to make available to the world concepts and topics that they consider meaningful. In some sense, tracing the evolution of the wikipedia subsets should mirror the develop of significant trends within each linguistic community. • generation on time: wikipedia provides time information associated with nodes. Moreover, it provides old information: time information for the creation and the modifications for each page on the dataset. • independency of external links: wikipedia articles link mainly to articles on the same dataset. • variety of graph sizes: it can be collected one graph by language, and the graph dimensions vary from a few hundred pages up to half million pages. GIACS PALERMO 9-4-08 •Results Summarizing: • We have available all the history of growth, so that we can study the evolution • We have an example of a “social” network of huge size • We can compare the system produced by users of different language, thereby measuring the effect of different cultures. • We can study Wikipedia as a case study for the World Wide Web WE RECOVER A PREFERENTIAL ATTACHMENT MECHANISM FROM THE DATA. DIFFERENT LANGUAGES PRODUCE SIMILAR STRUCTURES WE FIND A SYSTEM SIMILAR TO THE WWW EVEN IF THE MICROSCOPIC RULE OF GROWTH IS VERY DIFFERENT. GIACS PALERMO 9-4-08 •The Wiki graphs We generated six wikigraphs, wikiEN, wikiDE, wikiFR, wikiES, wikiIT and wikiPT, generated from the English, German, French, Spanish, Italian and Portuguese datasets, respectively. The graphs were obtained from an old dump of June 13, 2004. We are not using the current data due to disk space restrictions. The English dataset of June 2005 has more than 36 GB compacted, that is about 200 GB expanded. The page that was mostly visited was the main pages for wikiEN, wikiDE, wikiFR and wikiES, while that for the datasets wikiIT and wikiPT there were no visits associated with the pages. GIACS PALERMO 9-4-08 • SCC (Strongly Connected Component) includes pages that are mutually reachable by traveling on the graph • IN component is the region from which one can reach SCC • OUT component encompasses the pages reached from SCC. • TENDRILS are pages reacheable from the IN component,and not pointing to SCC or OUT region TENDRILS also includes those pages that point to the OUT region not belonging to any of the other defined regions. • TUBES connect directly IN and OUT regions, • DISCONNECTED regions are those isolated from the rest. GIACS PALERMO 9-4-08 The Bow-tie structure, found in the WWW (Broder et al. Comp. Net. 33, 309, 2000) •The Wikigraphs The measure/size of the Wikigraph for the various languages. The percentage of the various components of the Wikigraph for the various languages. GIACS PALERMO 9-4-08 •Power laws (what else? ) The Degree shows fat tails that can be approximated by a power-law function of the kind P(k) ~ k-g Where the exponent is the same both for in-degree and outIn the case of WWW degree. 2 ≤ gin ≤ 2.1 in–degree(empty) and out–degree(filled) Occurrency distributions for the Wikgraph in English (○) and Portuguese (). GIACS PALERMO 9-4-08 •Correlations As regards the assortativity (as measured by the average degree of the neighbours of a vertex with degree k) there is no evidence of any assortative behaviour. The average neighbors’ in–degree, computed along incoming edges, as a function of the in– degree for the English (○) and Portuguese () GIACS PALERMO 9-4-08 •PageRank The pagerank distribution for wikiEN is a power law function with γ = 2.1. Previous measures in webgraphs also exhibit the same behaviour for the pagerank distribution. We list the number of visits of the top ranked pages just to show that this value is not related with the pagerank values. We confirm that very little correlation was found between the link analysis characteristics and the actual number of visits. GIACS PALERMO 9-4-08 •Preferential attachment Given the history of growth one can verify the hypothesis of preferential attachment. This is done by means of the histogram P(k) who gives the number of vertices (whose degree is k) acquiring new connections at time t. This is quantity is weighted by the factor N(t)/n(k,t) English (○) and Portuguese (). White= in-degree Filled = out-degree GIACS PALERMO 9-4-08 We find preferential attachment for in and out degree. •Updates’ statistics Other power-laws related to dyamics need to be explained For example the number of updates also follows a power law. Each point presents the number of nodes (y axis) that were updated exactly x times. GIACS PALERMO 9-4-08 •Wikipedia growth model We introduced an evolution rule, similar to other models of rewiring already considered*, • At each time step, a vertex is added to the network. It is connected to the existing vertices by M oriented edges; the direction of each edge is drawn at random: •with probability R1 the edge leaves the new vertex pointing to an existing one chosen with probability proportional to its in– degree; • with probability R2, the edge points to the new vertex, and the source vertex is chosen with probability proportional to its out–degree. • Finally, with probability R3 = 1 − R1 − R2 the edge is added between existing vertices: the source vertex is chosen with probability proportional to the out–degree, while the destination vertex is chosen with probability proportional to the in–degree. * See for example Krapivsky Rodgers and Redner PRL 86 5401 (2001) GIACS PALERMO 9-4-08 •Wikipedia growth model From these data it seems that a model in the spirit of BA could reproduce most of the features of the system. Actually 1) This network is oriented. 2) The preferential attachment in Wikipedia has a somewhat different nature. Here, most of the times, the edges are added between existing vertices differently from the BA model. For instance, in the English version of Wikipedia a largely dominant fraction 0.883 of new edges is created between two existing pages, while a smaller fraction of edges points or leaves a newly added vertex (0.026 and 0.091 respectively). GIACS PALERMO 9-4-08 •Wikipedia growth model The model can be solved analytically P(kin) ~ kin- gin gin = -(1+1/(1-R2)) P(kout) ~ kout- gout gout = -(1+1/(1-R1)) gin 2.100 gout 2.027 GIACS PALERMO 9-4-08 We can use for the model the empirical values of R1=0.026 R2=0.091 R3=0.883 Already measured for the English version of Wikigraph •Wikipedia growth model The model can be solved analytically Knnin (kin) ~ M N1-R1 R1R2/R3 (R3≠0) Both cases is constant Knnin (kin) ~ M R1R2 ln (N) (R3=0) The value of the constant depends also upon the initial conditions. The two lines refer to two realizations of the model where in one case the 0.5% of the first vertices has been removed. GIACS PALERMO 9-4-08 •Wikipedia growth model • We have a structure that resembles the bow-tie of the WWW • We have a power-law decay for the degree distributions and also a power-law decay for the number of one page updates • Preferential Attachment in the Rewiring seems to be the driving force in the evolution of the system • The microscopic structure of rewiring is very different from that of WWW In principle a user can change any series of edges and add as many pages as wanted. Still most of the quantities are similar GIACS PALERMO 9-4-08 •Wikipedia growth model It turns out that the pagerank of the pages is not related with the number of visit opens a very interesting scenario for further research work. Since, by definition, pagerank should give us the visit time of the page and since actually it is complety indipendent by the number of visits, we wonder if pagerank is a good measure of the authoritativeness of the pages in wikigraphs and which modifications should be introduced in order to tune its performances. GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks From satellite images one gets Digital Elevation Models (DEM) From DEM a spanning tree is computed (via steepest descent) From the spanning tree, the number of points uphill is computed 156.4 132.4 111.4 2 3 4 170.8 161.3 108.2 1 1 6 182.4 154.5 106.0 1 2 9 GIACS PALERMO 9-4-08 •River Networks HACK’S LAW L// ~ Ah GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks Data on Mars topography were collected through the Mars Orbiter Laser Altimeter (MOLA) GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks GIACS PALERMO 9-4-08 •River Networks Results are that we can distinguish regions whose DEM networks have properties similar to River Networks on Earth. For River on Earth P(A) A-1.43 GIACS PALERMO 9-4-08 THE LIQUIDITY MARKET Monetary Policy ECB Reserves Banks get liquidity from ECB through auctions Monetary policy realised by ECB to control interest rates BANKS MANAGE THEIR LIQUIDITY IN THE INTERBANK MARKET The Market Money Market •EUROPEAN CENTRAL BANK provides LIQUIDITY to European Banks, through weekly auctions. •EVERY BANK must DEPOSIT to NATIONAL CENTRAL BANK the 2% of all deposits and debts issued in the last two years. This reserves are supposed to help in the case of liquidity shocks •2% value fluctuates in time and it is recomputed every month. ECB Banks sell and buy liquidity to adjust their liquidity needs and at the same time tend to reduce the value of reserve. The Market Market Data The interbank markets are basically managed by each European country. These markets are in almost all case phone-based, that means that each bank has some brokers doing their transactions by phone. The only exception is the Italian market, which is totally screen-based, implying that each banks operator can see real time quotes of all other banks and do its transaction. The recent paper by Boss et al. investigate the network of overall credit relationships in the Austrian Interbank market. In their study the authors analyze all the liabilities for ten quarterly single months periods, between 2000 and 2003, among 900 banks. They find a power-law distribution of contract sizes, and a power-law decay of the distribution of incoming and outgoing links (a link between two banks exists if the banks have an overall exposure with each other). Furthermore they show that the most vulnerable vertices are those with the highest centrality (measured by the number of paths that go through them). A different issue has been explored by Cocco et al. who have investigated the nature of lending relationships in the fragmented Portuguese interbank market over the period 1997-2001. In fragmented markets the amount and the interest rate on each loan are agreed on a one-to-one basis between borrowing and lending institutions. Other banks do not have access to the same terms, and no public information regarding the loan is available. The authors showed that frequent and repeated interactions between the same banks appear with a probability higher than those expected for random matching. In addition they found that during illiquid periods, and in particular during the Russian financial crisis preferential lending relationships increased. The Market Market Data Italian Interbank Money Market Banks operating on the Italian market, this market is fully electronic for interbank deposit since 1990 (e-Mid) *) Daily volume 18 billion Euros *) 200 participants We report here the analysis on 196 Italian banks (plus 18 banks from abroad who interact with them) who did 85202 transactions in 2000. INTRODUCTION Time activity two time scales: day one month maintenance period Statistical Properties Market Data The network shows a rather peculiar architecture The banks form a disassortative network where large banks interact mostly with small ones. Statistical Properties Market Data Actually the banks form different groups roughly related to their “size” when considering the average volume of money exchanged. Statistical Properties Degree Distributions Using the latter quantity we can divide banks in four groups (same number of classes of the Bank of Italy classification). Group 1 with volume in the range 0-23 million Euro per day, Group 2 in the range 23-70 million Euro per day, Group 3 in the range 70-165 million Euro per day, Group 4 over 165 million Euro per day. In this way we find an overlap of more than 90% between the two classifications. Communities Separation of business Two main communities emerge Many small banks and few little banks. Second eigenvector of the normal matrix Modelling Model of bank network We assign to the N nodes (N is the size of the system) a value drawn from the previous distribution. Vertices origin and destination for one edge are chosen with a probability pij proportional to the sum of respective sizes vi and vj . In formulas pij ( v +v = (v + v i j i i , j i vi vi j 1 ( v + v = (v + v = 2( N - 1)V i , j i i Vtot = j 1 vi 2 i 2 i , j i i j tot vi vi vi Modelling Market Data MODELLING Model and clustering To quantify the agreement between experimental and simulated networks we also define an overlap parameter m specifying how good is the behavior of the model in reproducing the observed clustering. To quantify the agreement between experimental and simulated networks, we proceed in the following way. We define a matrix E, that is a weighted matrix 4 × 4, where the weights represent the number of connections between groups. In order to measure the overlap between the matrices obtained by data and by computer model, we define a distance based on the differences between the elements of the matrices. MODELLING Model and clustering d= E g ,k g ex g ,k -E num g ,k We can define a distance between the number of intergroup edges in experimental data and numerical simulation. The sum of all elements, is equal to Etot in both cases. Therefore the maximum possible difference is 2Etot. This happens when all the links are between two groups in one case and in other two groups in the other. We use this maximum value to normalize the above expression and we than define the overlap parameter m: m = 1 − d/2Etot ex num E = E g ,k g ,k =Etot g ,k g WE HAVE AN OVERLAP m=98% g ,k g MODELLING Model and clustering To evaluate the relevance of division in classes, we have to compare the value of Eg,k with the corresponding quantity Enullg,k for a network where there is not a division in classes (null hypothesis). The analytical expression for the null case is Enullg,k = Etot/10 where 10 is the number of possible couplings between the 4 groups. The comparison between the two networks evidences that in the real case emerges the division in groups: in Table for each possible combination of groups is reported the value Eg,k/Etot. In the null case, each element of the same matrix should be equal to 10. Group 1 2 3 4 1 0 6 4 8 2 6 3 8 17 3 4 8 5 27 4 8 17 27 22 CONCLUSIONS Market Data Financial Networks can help 1. In distinguishing behaviour of different markets 2. In visualizing important features as the business role 3. In testing the validity of market models They might be an example of scale-free networks even more general than those described by growth and preferential attachment. CONCLUSIONS Thanks to Giulie Giulia De Masi, Dep. Economics Università delle Marche Italy Giulia Iori, Department of Economics, School of Social Science City University, London UK