Models and Structure of The Web Graph Stefano Leonardi Università di Roma “La Sapienza” thanks to A. Broder, E. Koutsoupias, P. Raghavan, D. Donato, G. Caldarelli, L. Salete Buriol Complex Networks • Self-organized and operated by a multitude of distributed agents without a central plan • Dynamically evolving in time • Connections established on the basis of local knowledge • Nevertheless efficient and robust, i.e. – allow to reach every other vertex in a few hops – Resilient to random faults • Examples: – The physical Internet – The Autonomous System graph – The Web Graph – The graphs of phone calls – Food webs – E-mail – etc…. Outline of the Lecture • • • • • Small World Phenomena • Experiments The Web Graph • Algorithms • Mathematical Models – – The Milgram Experiment Models of the Small World – – – Power laws The Bow Tie Structure Topological properties – – – Preferential Attachment Copying model Multi-layer – Geography matters Representation and Compression of the Web Graph Graph Models for the Web The Internet Graph Small World Phenomena Properties of Small World Networks • Surprising properties of Small World: – The graph of acquantancies has small diameter – There is a simple local algorithm that can route a message in few steps – The number of edges is small, linear in the number of vertices – The network is resilient to edge faults Small World Phenomena • A small world is a network – a large fraction of short chains of acquantancies – a small fraction of 'shortcuts' linking clusters with one another, superposition of structured clusters – Distances are short – no high degree nodes needed • There are lots of small worlds: – – – – spread of diseases electric power grids phone calls at a given time etc. The model of Watts and Strogatz[1998] 1. 2. 3. Take a ring of n vertices with every vertex connected to the next k=2 nodes Randomly rewire every edge to a random destination with pb p The resulting graph has with high probability diameter = O(log n) diameter = maxu,vdistance(u,v) The model of Jon Kleinberg [2000] • Consider a 2-dimensional grid • For each node u add edge (u,v) to a vertex v selected with pb proportional to [d(u,v)]-r – If r = 0, v selected at random as in WS – If r = 2, v between x and 2x with equal probability Routing in the Small World • Define a local routing algorithm that knows: – – – – Its position in the grid The postition in the grid of the destination The set of neighbours, short range and long range The neighbours of all the vertices that have seen the message • If r=2, expected delivery time is O(log2 n) • If r≠2, expected delivery time is Ώ(nε), ε depends on r The Web Graph Web graph •Notation: G = (V, E) is given by – a set of vertices (nodes) denoted V – a set of edges (links) = pairs of nodes denoted E •The page graph (directed) – V = static web pages (4,2 B) #pages indexed by – E = static hyperlinks (30 B?) Google on March 5 Why is it interesting to study the Web Graph? •It is the largest artifact ever conceived by the human •Exploit its structure of the Web for – Crawl strategies – Search – Spam detection – Discovering communities on the web – Classification/organization •Predict the evolution of the Web – Mathematical models – Sociological understanding Many other web/internet related graphs • Physical network graph – V = Routers – E = communication links • The host graph (directed) – V = hosts – E = There is an edge from a page on host A to a page on host B • The “cosine” graph (undirected, weighted) – V = static web pages – E = cosine distance among term vectors associated with pages • Co-citation graph (undirected, weighted) – V = static web pages – E = (x,y) number of pages that refer to both x and y • Communication graphs (which hosts talks to which hosts at a given time) • Routing graph (how packets move) • Etc Observing Web Graph • It is a huge ever expanding graph • We do not know which percentage of it we know • The only way to discover the graph structure of the web as hypertext is via large scale crawls • Warning: the picture might be distorted by – Size limitation of the crawl – Crawling rules – Perturbations of the "natural" process of birth and death of nodes and links Naïve solution •Keep crawling, when you stop seeing more pages, stop •Extremely simple but wrong solution: crawling is complicated because the web is complicated – spamming – duplicates – mirrors •First example of a complication: Soft 404 – When a page does not exists, the server is supposed to return an error code = “404” – Many servers do not return an error code, but keep the visitor on site, or simply send him to the home page The Static Public Web •Static – not the result of a cgi-bin scripts – no “?” in the URL – doesn’t change very often – etc. •Public – no password required – no robots.txt exclusion – no “noindex” meta tag – etc. Static vs. dynamic pages •“Legitimate static” pages built on the fly – “includes”, headers, navigation bars, decompressed text, etc. •“Dynamic pages” appear static – browseable catalogs (Hierarchy built from DB) •Huge amounts of catalog & search results pages – Shall we count all the Amazon pages? •Very low reliability servers •Seldom connected servers •Spider traps -- infinite url descent – www.x.com/home/home/home/…./home/home.html •Spammer games In some sense, the “static” web is infinite … The static web is whatever pages can be found in at least one major search engine. Large scale crawls •[KRRT99] Alexa crawl, 1997, 200M pages – Trawling the Web for emerging cyber communities, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, •[BMRRSTW 2000] AltaVista crawls, 500 M pages – Graph structure on the web, A. B., F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, & J. Wiener •[LLMMS03] Webbase Stanford project, 2001. 400 M pages – Algorithms and Experiments for the Webgraph L. Laura, S. Leonardi, S. Millozzi, U. Meyer, J. Sibeyin, et al.. Graph properties of the Web – Global structure -- how does the web look from far away ? – Connectivity –- how many connections? – Connected components –- how large? – Reachability -- can one go from here to there ? How many hops? – Dense Subgraphs –- are there good clusters? Basic definitions •In-degree = number of incoming links – www.yahoo.com has in-degree ~700,000 •Out-degree = number of outgoing links – www.altavista.com has out-degree 198 •Distribution = #nodes with given in/out-degree Power laws •Inverse polynomial tail (for large k) – Pr(X = k) ~ 1/k – Graph of such a function is a line on a log-log scale log(Pr(X=k)) = - * log(k) – Very large values possible & probable •Exponential tail (for large k) – Pr(X = k) ~ e-k – Graph of such function is a line on a log scale log(Pr(X=k)) = -k – No very large values • Examples Power laws – Inverse poly tail: the distribution of wealth, popularity of books, etc. – Inverse exp tail: the distribution of age, etc – Internet network topology [Faloutsos & al. 99] Power laws on the web – Web page access statistics [Glassman 97, Huberman & al 98 Adamic & Huberman 99] – Client-side access patterns [Barford & al.] – Site popularity – Statistical Physics: Signature of self-similar/fractal structure The In-degree distribution Altavista crawl, 1999 Broder et al. WebBase Crawl 2001 Laura et al.2003 Indegree follows power law distributionPr[in - degree (u ) k ] 2.1 1 k Graph structure of the Web Altavista, 1999 WebBase 2001 Out-degree follow power law distribution? Pages with a large number of outlinks are less frequent. What is good about 2.1 – Expected number of nodes with degree k 2.1 ~n/k – Expected contribution to total edges 2.1 1.1 ~n/k *k=~n/k – Summing over all k, the expected number of edges is ~n – This is good news: our computational power is such that we can deal with work linear in the size of the web, but not much about that. – The average number of incoming link is 7 !! More definitions •Weakly connected components (WCC) – Set of nodes such that from any node can go to any node via an undirected path. •Strongly connected components (SCC) – Set of nodes such that from any node can go to any node via a directed path. WCC SCC The Bow Tie [Broder et al., 2000] CORE IN OUT TENDRILS TUBES DISC …/~newbie/ www.ibm.com /…/…/leaf.htm Experiments (1) WebBase ‘01 Altavista ‘99 DISC. Regione 8% TENDRILS (numero di nodi) 22% SCC IN OUT TENDRILS 13% DISC. 4% SCC 33% 44.713.193 OUT 21% TENDRILS DISC. SCC Dimensione 28% 14.441.156 IN 21% 1 Pr[sizeScc53.330.475 (u ) i ] i 17.108.859 6.172.183 OUT 39% IN 11% Experiments (2) SCC distribution region by region Power Law 2.07 Experiments (3) Indegree distribution region by region Power Law 2.1 Experiments (4) Indegree distribution in the SCC graph Power Law 2.1 Experiments (5) WCC distribution in IN Power Law 1.8 What did we learn from all these power laws? • The second largest SCC is of size less 10000 nodes! • IN and Out have millions of access points to the CORE and thousands of relatively large Weakly Connected Components • This may help in designing better crawling strategies at least for IN and OUT, i.e. the load can be splitted between the robots without much overlapping. • While power law with exponent 2.1 is an universal feature of the Web, there is no fractal structure: IN and OUT do not show any Bow Tie Phenomena Algorithmic issues •Apply standard linear-time algorithms for WCC and SCC – Hard to do if you can’t store the entire graph in memory!! • WCC is easy if you can store V (semi-external algorithm) • No one knows how to do DFS in semi-external memory, so SCC is hard ?? • Might be able to do approx SCC, based on low diameter. •Random sampling for connectivity information – Find all nodes reachable from one given node (“Breadth-First Search”) – BFS is also hard. Simpler on low diameter Find the CORE • Iterate the following process: – – – – – Pick a random vertex v Compute all node reached from v: O(v) Compute all nodes that reach v: I(v) Compute SCC(v):= I(v) ∩ O(v) Check whether it is the largest SCC If the CORE is about ¼ of the vertices, after 20 iterations, Pb to not find the core < 1%. Find OUT • SCC OUT Find IN • SCC IN Find TENDRILS and TUBES • IN TENDRILS_IN • OUT TENDRILS_OUT TENDRILS_IN TENDRILS_OUT TENDRILS TENDRILS_IN TENDRILS_OUT TUBES Find DISC • DISCONNECTED: what is left. (2) Compute SCCs • Classical Algorithms: – DFS(G) – Transpose G in GT – DFS(GT) following vertices in decreasing order of f[v] (time of the end of the visit) – Every tree is a SCC. • DFS hard to compute on secondary memory: no locality DFS Classical Approach main(){ foreach vertex v do color[v]=WHITE endFor foreach vertex v do if (color[v]==WHITE) DFS(v); endFor } DFS(u:vertex) color[u]=GRAY d[u] time time +1 foreach v in succ[u] do if (color[v]=WHITE) then p[v] u DFS(v) endFor color[v] BLACK f[u] time time + 1 Semi-External DFS (1) (J.F. Sibeyn, J.Abello, U. Meyer) Memory space: (12+1/8) bytes per vertex a b Compute the DFS forest in several iterations until there are no forward edges: A forest is a DFS forest if and only if there are no forward edges Semi-External DFS (2) Data structures • Adjacency list to store partial DFS tree – – n+1 integers point to successors n+k integers to point up to n+k successors ( k>=n) n+1 0 0 2 3 4 5 0 1 2 n+1 pointers 5 n 3 n+k adjacent vertices 4 k Semi-External DFS (3) WhileDFS changes{ – – – } Add next k edges to the current DFS Compute DFS on n+k edges Update DFS Computation of SCC Is the Web a small world? – Based on a simple model, [Barabasi et. al.] predicted that most pages are within 19 links of each other. Justified the model by crawling nd.edu (1999) •Well, not really! Distance measurements •Experiment data (Altavista) – Maximum directed distance between 2 CORE nodes: 28 – Maximum directed distance between 2 nodes, given there is a path: > 900 – Average directed distance between 2 SCC nodes: 16 •Experimental data (WebBase) IN OUT – Depth of IN = 8 – Depth of OUT = 112 – Max backward and forward BFS depth in core = 8 More structure in the Web Graph • Insights from hubs and authorities: Dense bipartite subgraph Web Community Hub/fan • A large number of bipartite cliques, cores of hidden Web-communities can be found in the Webgraph [R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, 99] (4,3) clique Authority/center Disjoint bipartite cliques 200M Crawl Alexa, 1997 Kumar et al. 200M Crawl WebBase 2001 More Cyber-communities and/or better algorithm, for finding disjoint bipartite cliques Approach • Find all disjoint (i,j) cores, 3≤i≤10, 3≤j≤10 • Expand such cores into full communities • Enumerating all dense bipartite subgraphs is very expensive • We run heuristics to approximately enumerate disjoint bipartite cliques of small size Pre processing • Remove all vertices with |I(v)|>50 Not interested in popular pages such Yahoo, CNN, etc… • Use iterative pruning: Remove all centers with |I(v)|<j Remove all fans with |O(v)|<i Enumeration of (i,j) cliques 1. For a vertex v, enumerate size j subsets S of O(v) 2. If | I (u) | i then uS (i,j) clique found 3. Remove i fans and j centers 4. Repeat until the graph is empty Semi-external algorithm • List of predecesors and succesors stored in N/B blocks • Every block contains the adjacency list of B vertices and fits in main memory • Keep two 1-bit arrays Fan() and Center() in main memory • Phase I. and II. Easily implemented in streaming fashion • Phase III. Problem: Given S, computing I (u ) uS needs access to later blocks Semi-external algorithm Phase III. If we cannot decide on set S wihtin the current block, store S O (v) and I (u ) with the next block containing a uS vertex of S When moving a new block to main memory, explore all vertices in the block and continue exploration of set S inherited from previous blocks Computation of disjoint (4,4) cliques PageRank • PageRank measures the steady state visit rate of a random walk of the Web Graph • Imagine a random surfer: – Start from a random page – At any time choose with equal probability an outgoing link 1/3 1/3 1/3 Problem The Web is full of dead ends ? • Teleporting – With pb choose a random outgoing edge – With pb 1- continue from a random node • There is a long term visit rate for every page Page Rank Computation • Let a be the PageRank vector and A the adjacency matrix • If we start from distribution a, after one step we are at distribution aA • PageRank is the left eigenvector of A: a=aA • In practice, start from any vector a • Repeatedly apply a=aA a=aA2 a=aAk till a is stable Pagerank distribution • Pagerank distributed with a power law with = 2.1 [Pandurangan Raghavan, Upfal, 2002] on a sample of 100.000 vertices from brown.edu • In-degree and Pagerank not correlated!! • Compute Pagerank on the WebBase Crawl • Confirm Pagerank distributed with = 2.1 • Pagerank/Indegree correlation = 0.3 • Efficient external memory computation of Pagerank based on [T.H. Haveliwala, 1999] Efficient computation of Pagerank Models of the Web Graph Why to Study models for the Web Graph? • To better understand the process of content creation • To test Web applications • To predict its evolution and….. A challenging mathematical problem Standard Theory of Random Graph (Erdös and Rényi 1960) • Random Graphs are composed by starting with n vertices. • With probability p two vertices are connected by an edge P(k) Degrees are Poisson distributed P(k ) e pN ( pN ) k k! k Properties of Random Graphs • The probability that the degree is far from expectation c=pn drops exponenatially fast • Threshold Phenomena: – If c<1, the largest component has size O(log n) – If c>1, the largest component has size O(n), all others have size O(log n) • If c=ω(log n), the graph is connected and the diameter is O(log n/log log n) • We look for Random Graph models holding the properties of the Web Features of a good model • Must evolve over time: pages appear and disappear over time • Content creation some time independent, sometime dependent on the current Web, some links are random, others are copied from existing links • Content creators are biased towards good Pages • Must be able to reproduce relevant observables, i.e. statistical and topological structures Details of Models • Vertices are created in discrete time steps • At every time step a new vertex is created that connects with d edges to existing edges • d=7 in simulation • Edges are selected according to specific probability distributions Evolving Network [Alberts, Barabasi, 1999] • Growing Network with Preferential attachment: 1. Growth: Every time step a new node enter the system with d edges 2. Preferential Attachment: The probability to be connected depends on the degree P(k) k • P(k) ~ k-α, α=2 A formal argument[Bollobas, Riordan, Spencer, Tusnadi, 2001] • The fraction of nodes that has in-degree k is proportional to k-3 if d =1. • Consider 2n nodes and make a random pairing • Start from the left side. Idetify a vertex with all consecutive left endpoints, till reach a right endpoint • Preferential attachment fails to capture othe aspects of the Web-Graph, such as the large number of small bipartite cliques The Copying Model [Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins, Upfal, 2000] • It is an evolving model: vertices added one by one, and point with d=7 edges to existing vertices – When inserting vertex v, choose at random a prototype vertex u in the graph – With pb copy the jth link of u, with pb 1- choose a random endpoint – Indegree follows a power law with exponent 2.1 if =0.8 • Want to model the process of copying links from other releated pages • Try to form Web-communities whose hubs point to good authorities Properties of the Copying model • Let Nt,k be the number of nodes of degree k at time t. • Limt∞Nt,k/t ~ k-(2- )/(1- ) • Let Q(i,j,t) be the expected number of cliques K(i,j) at time t • Theorem: Q(i,j,t) is Ώ(t) for small values of i and j A Multi-Layer model of the Graph • Web produced by the superposition of multiple independent regions, Thematically Unifying Clusters [Dill et al, Self Similarity in the Web, 2002] • Different regions being different in size and aggregation criteria, for instance topic, geography or domain. • Regions are connected together by a ``connectivity backbone'' formed by pages that are part of multiple regions, e.g. documents that are relevant for multiple topics Multi-layer model [Caldarelli, De Los Rios, Laura, Leonardi, Millozzi 2002] • Model the Web as the superposition of different thematically unified clusters, generated by independent stochastic processes • Different regions may follow different stochastic models of aggregation Multi-layer model, details • Every new vertex is assigned to a constant number c=3, chosen at random out of L=100 layers. • Every vertex is connected with d=7 edges distributed over the c layers. • Whithin every layer, edges are inserted using the Copying with =0.8 or the Evolving Network model • The final graph is obtained by merging the graphs created in all the layers Properties of the Multi-layer model • The in-degree distribution follows a power law with =2.1 • The result is stable for a large variation of the parameter: – Total # of layers L – # of layers to which every page is assigned – The stochastic model used in a single layer What about SCCs • All models presented until now produce directed graphs without cycles • Rewiring the Copying model and the Evolving Network model: – Generate a graph according to the model on N vertices – Insert a number of random edges, from 0.01 N till 3N • In classical random graph, connected components suddenly emerge at c=1 Size of the largest SCC/#of SCCs • We observe the #of SCCs of size 1 and the size of the largest SCC • Both measures have a smooth transition as the number of rewired edges increase. • No threshold phenomena on SCC for Power Law graph • Similar result recently proved by Bollobas and Riordan for WCC on undirected graphs Copying model with rewiring Evolving Network model with rewiring Efficient computation of SCC Conclusions on Models for the Web • In-degree: all models give power law distributionfor specific parameters. Multilayer achieves =2.1 for a broad range of parameter • Bipartite Cliques: The Copying model achieves to form a large number of bipartite cliques • All models have high correlation between Page Rank and In-degree • No model replicates the Bow Tie structure and the Out-degree distribution The Internet Graph The map of the Internet Burch, Cheswich [1999] Falutsous3 [1999] Plot of the frequency of the outdegree in the Autonomous system Graph How many vantage points we need to to sample most of the connections between Autonomous Systems? The Web and the Internet are different! • A model for a physical network should consider geography. • Heuristically Optimized Trade-off (Carlson, Doyle). • Power law is the outcome of human activity, i.e. compromise between different contrastating objectives. • Network growth compromise between cost and centrality, i.e. distance and good positioning in the network. The FKP model, [Fabrikant, Koutsoupias, Papadimitriou, 2002] • Vertices arrive one by one uniformly at random in the unit square • Vertex i connects to a previous vertex j<i (a tree), – d(i,j): distance between i and j – hj: measure of centrality of node j • Avg distance to the other nodes, or • Avg # hops to other vertices • Choose j that minimizes d(i,j)+hj, depends on n Results • Indegree distributions depends on α: – <1/√N the tree is a star – = Ώ(√N) the degree is exponentially distributed – 4< = o (√N) the degree is distributed with a power law Challenges • Exploit the knowledge of the Web graph to design better crawling strategies. • Design models for the Dynamically Evolving Web: e.g. model the rate of arrival of new connections over time. • on-line algorithms with sub-linear space to maintain toplogical and statistical informations • Data Structure able to answer queries with time arguments Web Graph representation and compression Thanks to Luciana Salete Buriol and Debora Donato Main features of Web Graphs Locality: usually most of the hyperlinks are local, i.e, they point to other URLs on the same host. The literature reports that on average 80% of the hyperlinks are local. Consecutivity: links within same page are likely to be consecutive respecting to the lexicographic order. URLs normalization: Convert hostnames to lower case, cannonicalizes port number, re-introducing them where they need, and adding a trailing slash to all URLs that do not have it. Main features of WebGraphs Similarity: Pages on the same host tend to have many hyperlinks pointing to the same pages. Consecutivity is the dual distance-one similarity. Literature Connectivity Server (1998) – Digital Systems Reseach Center and Stanford University – K. Bharat, A. Broder, M. Henzinger, P. Kumar, S. Venkatasubramanian; Link Database (2001) - Compaq Systems Research Center – K. Randall, R. Stata, R. Wickremesinghe, J. Wiener; WebGraph Framework (2002) – Universita degli Studi di Milano – P. Boldi, S. Vigna. Connectivity Server ➢ ➢ ➢ ➢ Tool for web graphs visualisation, analysis (connectivity, ranking pages) and URLs compression. Used by Alta Vista; Links represented by an outgoing and an incoming adjacency lists; Composed of: URL Database: URL, fingerprint, URL-id; Host Database: group of URLs based on the hostname portion; Link Database: URL, outlinks, inlinks. Connectivity Server: URL compression URLs are sorted lexicographically and stored as a delta encoded entry (70% reduction). URLs delta encoding Indexing the delta enconding Link1: first version of Link Database No compression: simple representation of outgoing and incoming adjacency lists of links. Avg. inlink size: 34 bits Avg. outlink size: 24 bits Link2: second version of Link Database Single list compression and starts compression Avg. inlink size: 8.9 bits Avg. outlink size: 11.03 bits Delta Encoding of the Adjacency Lists Each array element is 32 bits long. Delta Encoding of the Adjacency Lists -3 = 101-104 (first item) 42 = 174-132 (other items) . Nybble Code The low-order bit of each nybble indicates whether or not there are more nybbles in the string The least-significant data bit encodes the sign. The remaining bits provide an unsigned number 28 = 0111 1000 -28 = 1111 0010 Starts array compression • The URLs are divided into three partitions based on their degree; • Elements of starts indices to nybbles; • The literature reports that 74% of the entries are in the low-degree partition. Starts array compression Entry range Partition # bits Z(x) > 254 High-degree partition 32 254 Z(x) 24 medium-degree partition (32+P*16)/P Z(x) < 24 low-degree partition (32+P*8)/P Z(x) = max (indegree(x), outdegree(x)) P = the number of pages in each block. Link3: third version of Link Database Interlist compression with representative list Avg. inlink size: 5.66 bits Avg. outlink size: 5.61 bits Interlist Compression ref : relative index of the representative adjacency list; deletes: set of URL-ids to delete from the representative list; adds: set of URL-ids to add to the representative list. LimitSelect-K-L: chooses the best representative adjacency list from among the previus K (8) URL-ids' adjacency lists and only allows chains of fewer than L (4) hops. -codes (WebGraph Framework) Interlist compression with representative list Avg. inlink size: 3.08 bits Avg. outlink size: 2.89 bits Compressing Gaps Uncompressed adjacency list Adjacency list with compressed gaps. Successor list S(x) = {s1-x, s2-s1-1, ..., sk-sk-1-1} For negative entries: Using copy lists Uncompressed adjacency list Adjacency list with copy lists. Each bit on the copy list informs whether the corresponding successor of y is also a successor of x; The reference list index ref. is chosen as the value between 0 and W (window size) that gives the best compression. Using copy blocks Adjacency list with copy lists. Adjacency list with copy blocks. The last block is omitted; The first copy block is 0 if the copy list starts with 0; The length is decremented by one for all blocks except the first one. Compressing intervals Adjacency list with copy lists. Adjacency list with intervals. Intervals: represented by their left extreme and lenght; Intervals length: are decremented by the threshold Lmin; Residuals: compressed using differences. Compressing intervals Adjacency list with copy lists. Adjacency list with intervals. 0 = (15-15)*2 600 = (316-16)*2 5 = |13-15|*2-1 3018 = 3041-22-1 50 = ? Compression comparison Huff. Link1 Link2 Link3 z-codes s-Node Inlink size Outlink size Access time # pages (million)# links (million) Database 15,2 15,4 112 320 WebBase 34 24 13 61 1000 Web Crawler Mercator 8,9 11,03 47 61 1000 Web Crawler Mercator 5,66 5,61 248 61 1000 Web Crawler Mercator 3,25 2,18 206 18,5 300 .uk domain 5,07 5,63 298 900 WebBase Using different computers and compilers. Conclusions The compression techniques are specialized for Web Graphs. The average link size decreases with the increase of the graph. The average link access time increases with the increase of the graph. The -codes seems to have the best trade-off between avg. bit size and access time.