XTreeNet: A Framework for Flexible Large Scale Information Dissemination & Retrieval TaeWon Cho, Divesh Srivastava, K. K. Ramakrishnan, Yin Zhang and many others AT&T Labs Research, NJ USA August 2011 © 2008 AT&T Intellectual Property. All rights reserved. Network as the Vehicle for Information Dissemination • The ‘network’ will (has) become increasingly Information-centric – Information of all types becoming electronic and network accessible – Access of information based on content of interest, instead of location • Information Overload - Scale: Producers and Consumers face challenges – Large number of producers (publishers; data sources) – Even larger number of consumers (subscribers, users querying/looking for content) o Tremendous number of information producers makes it difficult for a consumer to know where to find relevant information – Significant challenge: “whom and what to ask” & “whom and what to tell” • XTreeNet looks at the various problems related to a networkbased Information Dissemination and Retrieval environment – Obtain “information” of interest by asking the network to find it – Tell the network to deliver “information” of interest – Ask the network as to what “information” I should be interested in Page 2 © 2008 AT&T Intellectual Property. All rights reserved. Role of the Network in Information Dissemination • Success of information aggregators (search engines etc.) unquestionable – Information aggregators do play a key role • Limitation: – Dis-intermediates producers: constrains business model of producers • Timeliness and Coverage are also key criteria for information dissemination – Timeliness: Need information (including real-time) to be available right away o E.g., for a consumer to access real-time media content o Ability for the content to be withdrawn is also desirable – Coverage: Availability of information depends on set of information that is made available to the consumer by intermediaries, like an aggregator o Information providers can be “dynamic”/ transient. Complete coverage by an aggregator may be difficult o Desirable to enable information producers themselves to make it available on an asneeded basis • Publish-subscribe based access has become somewhat popular – (E.g., news groups, RSS feeds) • Information dissemination and Query-Response for Information Retrieval in a scalable manner is essential – Inherently N-to-N communication – We seek to exploit XML-tagging of information Page 3 © 2008 AT&T Intellectual Property. All rights reserved. XML Routing: Overlay Services based on XML XML router XML Overlay Network Data query generation Database Subscriber for alerts IP Network Infrastructure Subscriber for information Publisher • An XML Network: overlay network of XML switches/routers • XTreeNet project: investigate the design for a large-scale integrated publish/subscribe + query/response application • how can we partition functions between the overlay and underlay? Page 4 © 2008 AT&T Intellectual Property. All rights reserved. XTreeNet Overview • Publishers and Subscribers submit Content Descriptors (CD’s) to the network • As soon as CD (from producer or consumer) hits network, map into single hash-id at first overlay router – Subsequent routers forward based on hash-id downstream much more efficient than matching against aggregated query filters • XTreeNet builds a common Core-based tree(CBT) on a per-”CD” basis; integrate both producers and consumers of information – Dynamically create CBT on first arrival of CD from producer • Groups (overlay multicast) formed on an as-needed basis for each CD – Very fine grained distribution tree connecting producers & consumers – Branches to subscribers for disseminating published content & branches to publishers for forwarding queries – Different cores for different CDs – reduce likelihood of traffic concentration Page 5 © 2008 AT&T Intellectual Property. All rights reserved. Content Descriptors • Content Descriptors (CDs) act like “indexes” in a distributed data base environment – Each data item generated by a producer and each consumer query filter are independently mapped to a set of CDs – A data item matches a query when respective sets of CDs have at least one CD in common • CDs decouple producers from the consumers – Can support heterogeneous producer schemas • CD can be an element of a topic hierarchy; multiple hierarchies may be supported (e.g., topics, geographic location) – An XML schema path (root-to-leaf path) may also be used as basis of hierarchically structured domain for constructing CDs rss o Disambiguate between multiple XML documents using string values at leaves channel <rss> <channel> <editor> Jupiter </editor> <item> <title> ReutersNews </title> <link> reuters.com </link> </item> <description> abc </description> </channel> </rss> Page 6 editor item title description link Jupiter ReutersNews © 2008 AT&T Intellectual Property. All rights reserved. reuters.com abc • Publisher guidance Scalability of CDs o Information publisher provides guidance on what XML tags of potential interest • Strategies o Fullpath: /rss/channel/item/title/ReutersNews o Last Tag: /title/ReutersNews o Keyword: ReutersNews • Estimated by extracting CDs from XML version of Wikipedia Unique CDs genereated by Wikipedia articles 8000000 7000000 6000000 # of unique Cds • ~ 5M CDs for about 1M articles and grows slowly – duplication of CDs in documents 5000000 Fullpath 4000000 Last Tag 3000000 Keyword 2000000 Last Tag + Keyword 1000000 10 43 00 0 85 00 12 00 7 0 16 00 9 0 21 00 1 0 25 00 3 0 29 00 5 0 33 00 7 0 37 00 9 0 42 00 1 0 46 00 3 0 50 00 5 0 54 00 7 0 58 00 9 0 63 00 1 0 67 00 3 0 71 00 5 0 75 00 7 0 79 00 9 0 84 00 1 0 88 00 3 0 92 00 5 0 96 00 0 10 70 0 00 10 90 5 0 10 10 0 93 0 0 00 0 0 # of Wikipedia articles Page 7 © 2008 AT&T Intellectual Property. All rights reserved. Scalable Multicast: Multicast Architecture with Adaptive Dual-state • Multicast is key to efficient information dissemination • Requirements for Information-centric Multicast: – Scalability in group membership o Fine granularity of access support for large number of groups – Persistent access to group o Network should be responsible for maintaining group membership unless users explicitly un-subscribe from group – Minimize loss of information – Keep control traffic scalable • Limitations of existing IP / Overlay Multicast o Forwarding state grows linearly with number of groups – State overhead (at multiple routers) o Soft-state needs to be refreshed – Control overhead o Hence, limits scalability and has inadequate persistence • How to achieve scalable and persistent multicast? • MAD seeks to solve issues of scale and persistence with multicast Page 8 © 2008 AT&T Intellectual Property. All rights reserved. Group Memberships Lifetime & Activity Level •Membership (e.g., in a pub-sub environment) likely to be long-lived Subscription count to YouTube channels •Users subscribe, and remain interested in receiving info’ even when publishers distribute infrequently •Only 2.3% groups see reduction •Long-lived membership results in •Network state grows for group; increased group size • Group activity can vary widely RSS: Publishing rate (# updates/month) – Analyzed publishing activity of RSS feeds o Only 5% RSS feeds publish more than 100 updates/month o Median rate is 10 updates/month – 10% most active feeds contribute 75% updates • IP multicast: Inactive groups usually treated the same as an active group o But can’t afford loss of information Page 9 © 2008 AT&T Intellectual Property. All rights reserved. Using an IP-Multicast Style Approach • Every intermediate router has to maintain state o Forwarding state grows linearly with number of groups – State overhead (at multiple routers) o Soft-state needs to be refreshed – Control overhead • A lot of routers maintain forwarding state: 00 11 05 02 09 14 13 15 12 10 01 Page 10 04 08 • 6 intermediate routers keep state that has to be constantly refreshed 03 •4 first hop routers also keep state 06 07 First-hop router (FH) Forwarder Router not participating User © 2008 AT&T Intellectual Property. All rights reserved. The MAD environment • MAD multicast service overlay consists of a set of logical overlay routers • Each logical router serves as a single aggregated local subscriber for all users attached to it • Subscription manager responsible for all the users’ subscription management – maintains subscriptions for users connected to site Page 11 © 2008 AT&T Intellectual Property. All rights reserved. Differentiate the Roles of Multicast State • Membership State vs. Forwarding State • Group membership can be separated from forwarding state – Group membership must be stored scalably and persistently o Especially for groups that have low frequency of information flow – Forwarding state: efficient forwarding of active groups o Can be re-generated when a group becomes active • Active and inactive groups can be treated differently – Small percent of (active) groups generate data at a high rate: forward efficiently – Large percent of (inactive) groups generate low traffic volume Page 12 © 2008 AT&T Intellectual Property. All rights reserved. The MAD Solution • Group membership is separated from forwarding state: Multicast with Adaptive Dual State • Use Membership Tree (MT) for scalable state maintenance – Store group membership information in MT o Minimize number of intermediate routers keeping group state – Impose static virtual hierarchy => no control overhead o But, static hierarchy may not result in optimal delivery path • Use Dissemination Tree (DT) for forwarding efficiency – Use DT for active groups o Can use any “state-of-art” multicast protocol • MAD may begin as an overlay multicast service – Use IP multicast to improve forwarding efficiency for DT – MT may also eventually evolve to being supported by the underlay • MAD achieves best of both worlds - scalability and forwarding efficiency Page 13 © 2008 AT&T Intellectual Property. All rights reserved. MAD Membership Tree protocol overview • Goal of Membership Tree: reduce # routers keeping multicast group state • MT selects the core (root) based on hash of group ID – Define a single base tree at this root (static) – All groups selecting this root use the base tree to construct MT • Subscriber join is forwarded up on the base tree until it reaches first on-tree node for this group’s MT – When a subtree rooted at an en-route router has more than a min. # of first-hop routers with attached subscribers, the parent node on the MT requires that the en-route router join the MT • MAD protocol provides for seamless transition to switch from DT to MT as level of group activity changes (reduces) over time Page 14 © 2008 AT&T Intellectual Property. All rights reserved. Routers Maintaining State in MAD Base Tree 00 00 01 02 03 04 05 06 07 10 11 12 13 14 04 02 09 14 13 15 12 10 06 01 08 03 07 08 05 09 11 15 Membership Tree (4 First-hops, 5 users) Virtual membership tree (fan-out 8, aggregation threshold 2) • Fewer routers maintain state: – 2 intermediate routers and 4 FH routers • Forwarding by multicast/unicast – not necessarily efficient • MT reduces number of routers keeping Multicast State by aggregating subscriber state in a virtual sub-tree Page 15 © 2008 AT&T Intellectual Property. All rights reserved. Scalability of Multicast with MAD • Evaluation using simulation and measurements with implementation – Implementation measured on Emulab with about 100 routers – Simulation with 16,000 routers; Power-law topology • MAD achieves both efficient state maintenance and efficient forwarding • Forwarding efficiency with MAD is as good as IP multicast (DT) Total Delay (msec) Number of Groups (Trillions) • State efficiency with MAD is significantly better than IP multicast-like approaches (DT) Number of First-Hop Routers in a Group Page 16 Number of First-Hop Routers in a Group © 2008 AT&T Intellectual Property. All rights reserved. Summary • XTreeNet: project we have been working on – primarily focused on the meta-data plane – XTreeNet Architecture – complex processing at the edges; efficient forwarding in the core – MAD: Scalable Multicast – Large # groups; Large # subscribers – QDTs: Query Distribution Trees for Distribution of Complex Queries – Load Balancing, Privacy preservation, Censorship Resistant – Recommendation Systems: Scalable, Privacy Preserving • More recent work: “COPSS: An Efficient ContentOriented Publish/Subscribe System” in collaboration with folks from University of Goettingen, Germany Page 17 © 2008 AT&T Intellectual Property. All rights reserved.