Explain emergence of structure in the World Wide Web Aggregation and competition under informational increasing returns Presentation by: Petros Kavassalis Contact at: ATLANTIS Group, University of Crete & ICS_FORTH, Greece petros@itc.mit.edu together with: Stelios LELIS, ATLANTIS Group, Univ. of Crete, Greece Charis LINA, ATLANTIS Group, Univ. of Crete, Greece Manolis PETRAKIS, Dpt of Economics & ATLANTIS Group, Univ. of Crete, Greece Jakka SAIRAMESH, IBM IAC, USA Presentation at BT meeting: M. Vavalis, iCities Project Manager BT/January 2003 2 agenda A Web Simulated Economy (WSE)… …To explain agglomeration and fast growth in the Web Network approach to “Web’s Hidden Order” Urban explanations of the web sites’ fast growth and differentiated competition BT/January 2003 3 iCities project funded by FET WSE Economic Geography & Case studies • Modeling experience • Analysis of existing information cities Internet Behavioral Models • Economic frameworks • Bounded rationality • User heterogeneous preferences • Sites with differentiated offerings • Info propagation networks • Sites linked hierarchically • Network externalities Design of iCities ? Behavior Language • Conceptual framework • Behavioral rules iCities project Simulation Framework • Speed Data-strucuture design Parallel/distributed execution • Scalability • Configurability (programability) Multiple models Component-based Data structures/interfaces BT/January 2003 4 A Web Simulated Economy (iCities WSE) On top of Mozart/Oz (SICS): rigorous simulation environment Capturing essential characteristics of the real web economy: agglomeration & scale-free state in distribution of population across web sites Capable to provide insight on empirical regularities: result of the joint action of superposed networks Able to explain web organization and progressive, fast, web formation: reveal patterns of Internet population clustering into web locations Reference: New Economic Geography Agglomeration in the real world Increasing returns P. Krugman, B. Arthur BT/January 2003 5 What the EconGeo has to say to the Web? P. Krugman, The Self-organizing Economy The geographical space reveals different forms of concentration of population and economic activity. These are not only the result of inherent differences between locations but also of some set of cumulative processes, necessarily involving some forms of increasing returns, whereby concentration can be self-reinforcing. B. Arthur, Increasing Returns and Path Dependence in the Economy Increasing returns are the tendency for that which is ahead to get further ahead, for that which loses advantage to further lose advantage. They are mechanisms of increasing returns that operate to reinforce that which gains success or aggravate that which suffers loss. BT/January 2003 6 Towards an economic geography of the Web H1: Heterogeneous populations of agents H2: Network structures matter H3: There are Informational Increasing Returns BT/January 2003 7 H1: An economy with two populations... Internet Users with partial information Web Sites with performance varying over the course BT/January 2003 8 H2: Decision embedded in nets of interaction Word-of-mouth network or network externalities increasing returns Social netw orks preferences U nits of action L in kages (including nav igation hierarchies) increasing returns Underlying network Portfolio of sites BT/January 2003 9 H3: Informational Increasing Returns Networks carry increasing returns Word-of-mouth information propagation network (social network with local ties and long distance relationships) Underlying network linking sites (navigation is hierarchical, produces “linkages”) Amazon.com-like network externalities (agglomeration benefit) BT/January 2003 10 The issue: explain power law regularity A Web Simulated Economy (WSE)… …To explain agglomeration and fast growth in the Web Network approach to “Web’s Hidden Order” Urban explanations of the web sites’ fast growth and differentiated competition BT/January 2003 11 Huberman’s diagnostic: Web Hidden Order! A power law distribution is a straight line on a log-log scale Xerox Xerox Internet Internet Ecologies Ecologies Project Project AOL Data, AOL Data, Proportion of sites The distribution of Internet users per web site follows a universal power law Number of users BT/January 2003 12 We have reproduced it! % users volume BT/January 2003 % sites all sites Our results all sites Xerox results 0.1 9.28 32.36 1 56.79 55.63 5 85.27 74.81 10 92.77 82.26 50 98.96 94.92 13 Why is this important? We provide a network-base explanation for the power law regularity! Internet consumers: web topology web location s, ri j inhabitants of the web location j Web sites portfolio of web infohabitant i i social network infohabitant , c, lp, uv, d, , s BT/January 2003 Surf the web Learn about web sites by asking other people (word-of-mouth) or by surfing from one site to another along hyperlinks Visit these sites, evaluate and include them in a portfolio of FVS (U = performance + e) Have loyal behavior 14 What does this imply? A network approach to the power law issue: Previous attempts: “random growth” models (from Simon to… Huberman) Question: Where does such a growth come from? Direction: Krugman sees in percolation models, one possible way around the problems with “random growth” models We took that way: online concentration should be the result of a process involving random transport networks Word-of-mouth information diffuses over a social network structure linking Internet users Sites link network transport users from one site to another (navigational hierarchies) BT/January 2003 15 In a nutshell… INFORMATIONAL INCREASING RETURNS Networks carry increasing returns Word-of-mouth network Sites link network Small world assumption Watt-Storgatz (WS) beta model with new nodes entering the game Short path length Large clustering coefficient 1. Small world (WS model) 2. Scale free network (Barabasi) Directed links New nodes enter the game Rewiring of existing links BT/January 2003Preferential attachment 16 Small world-Small World: findings (I) Scatter plot: Size versus Age Scatter plot: Size versus Performance BT/January 2003 17 Small world-Small World: findings (II) Evolution of growth rate for site ranked at position 1 Evolution of growth rate for site ranked at position 125 BT/January 2003 18 Small world-Small World: findings (III) Sites succeeding to be ranked at the higher positions belong to “neighborhoods” of highly visited sites BT/January 2003 19 Small world-Small World: findings (IV) Word of mouth (Centripetal) Exploration (Centrifugal) Users loyalty (Centrifugal) Clustering coefficient (Centrifugal) μ : power law exponent γ : proportion of sites that are visited at least by one user at final timestep BT/January 2003 20 Small world-Scale free: findings (I) Most findings are confirmed (slope: 1.4) BT/January 2003 21 Small world-Scale free: findings (II) Scatter plot: Size versus Performance and In-degree BT/January 2003 22 Small world-Scale free-Investments Sites performance varies over time Sites decide to make investments in predefined time intervals, to improve their performance (affront clutter costs) Accumulated investments depreciate over time Investments are made on the basis of Growth rate Market share (for established sites) Investments produce a performance increment with a certain probability (there are attention costs) Entry strategies suppose an investment to obtain a good performance and a number of in-links Out- links are also growing over time Algorithm for out-links growth BT/January 2003 23 Small world-Scale free-Investments: findings (I) A power law distribution in sites sizes is again obtained (in general and within categories) BT/January 2003 24 Small world-Scale free-Investments: findings (II) Sites’ growth rates fluctuate between time intervals in an uncorrelated fashion but about a positive mean value This is evident in HubermanAdamic’s data and they use it as an assumption to build their model Right picture: Fractional fluctuations in the number of users of site ranked at position 60. BT/January 2003 25 Small world-Scale free-Investments: findings (III) Web sites’ age and popularity are slightly correlated This is evident in Huberman-Adamic’s data. Right picture: Scatter plot of the number of unique visitors versus age. BT/January 2003 26 Small world-Scale free-Investments: findings (IV) In- and out-degree distribution of sites follow power-laws. In-degree distribution Out-degree distribution BT/January 2003 27 Small world-Scale free-Investments: findings (V) Slight correlation between the age of sites and their number of in-coming links. This is evident in Huberman-Adamic’s data. Right picture: Scatter plot of the number of incoming links versus age. BT/January 2003 28 Small world-Scale free-Investments: findings (VI) Again: Relative performance is awarded more than absolute performance A number of late entrants may survive and prosper (our model spans over Huberman and Barabasi’s models) But: As economic variables enter directly the model, they are able to break down the power law stability Or, a power law distribution survives as long as new sites enter regularly the game (our assumption: exponential entry rate) Then? Instability? What kind of instability? BT/January 2003 29 The issue: provide directly economic explanations A Web Simulated Economy (WSE)… …To explain agglomeration and fast growth in the Web Network approach to “Web’s Hidden Order” Urban explanations of the web sites’ fast growth and differentiated competition BT/January 2003 30 An info-economy for experience goods web topology Users of the web location j Web location j • Performance • Vector of products j Search engine externality portfolio of user i User i • Vector of preferences BT/January 2003 31 Internet users Have preferences over content/service categories (e.g. Books, Internet communication) and versions (generic/scientific, e-mail/instant messaging/chat rooms etc) Have a portfolio of frequently visited sites Find new sites to visit through: Search Engine. Users periodically submit queries related to their preferences to a search engine Exploration. Users surf from one site to another following the links of sites network Evaluate new sites and include in their portfolio the sites with the highest utility Users are loyal to their portfolio sites/They include a new site in their portfolio after number λ visits to that site (stickiness) Users’ utility function depends on Site performance Matching of user preferences and site offerings Agglomeration benefit BT/January 2003 32 Web sites Offer a vector of product versions on specific content/service categories Have a dynamic performance characteristic , rj, that determines their performance in practice. Periodically make investment to ameliorate their performance May offer services that provide an additional benefit (“agglomeration” benefit/AB) to their visitors: When agents make choices about web sites, they receive a payoff depending on the number of agents having already visited that site at the time of choice Configuration with 3 types of sites n versions in 1 category + AB Specialized Highly Differentiated n versions in m categories [1…n] versions in 2 categories + AB with some probability BT/January 2003 Partially Differentiated 33 Model ingredients Investment Strategy Conservative Aggressive Entry strategy Initial investments Strategic use of “in-links” opportunities Strategic use of Search Engines’ promotion opportunities Continuously updated Sites link network Sites implement a “where to link” strategy (according to categorial relatedness and popularity) Random update Growing number of out- links BT/January 2003 34 Principal formal elements User i’s utility from site j (versions’ matching benefit) User i’s utility from site j i i i U (t ) Αj (t )( i1ij1 i 2 ij 2 ... iC kjC ) V ij Site j’s performance i ijc S ic Tic | x ic - x jc | 2 U ijA t A j t * log 1 ms j t * (1 / nijmatch ) * (nc 1) / sv 2 c (agglomeration benefit) Site j’s market reach 1 ms j (t ) V j (t ) M (t ) (1 a ) / a c Aj t 1 1 Puncer * log Inv j (t ' ) log 1 ms j t ms j t c BT/January 2003 35 Results (I) Evidence of concentration New entrants can enter top ranks BT/January 2003 36 Results (II) Fast growth pattern is due to various networks that are present (mostly to the sites link network) and depends also on how search engines are doing their work Coexistence of Highly Diversified, Partially Diversified & Specialized Sites The Agglomeration Benefit introduces interesting criticalities Early entry seems to be related with a higher probability of success (however, late entrants can survive and prosper) Strategic investment produces instability Speculation: Instability would evolve to a “cable TV”-like industrial organization model? BT/January 2003 37