T-110.6120
17.11.2011
Jimmy Kjällman
Ericsson Research, NomadicLab
• Two research prototypes will be described in this presentation
• Blackadder
– Developed in PURSUIT
– Channel-oriented base implementation
– Demonstrated at the end of the lecture
• Blackhawk
– Originates from PSIRP
– Document-oriented implementation
Original slides:
George Parisis, Computer Laboratory, University of Cambridge, 2011
• Realizes PURSUIT’s functional model for information-centric networking
Pub/Sub Service Model
Dissemination
Strategy
Rendezvous Topology
RId
Forwarding
Functional scoping
Information scoping
SId
RId
Recursion
• Scopes, subscopes, information items
• Information is structured as a directed acyclic graph
• IDs are (statistically) unique within a scope
– (Possibly) self-generated, flat labels
– Same ID space for both subscopes and information items
• “Complete” identifier: Prefix + ID
– One or more paths starting from one or more graph’s root(s)
Scope
Information item
AAA1 0002 AAA0 AAA1
0001 0002 0003
Information ID : /0003/0002/AAA2
Scope ID : /0001/0001/0001, /0002/0001/0001, /0003/0001/0001
• Simplified example
• Defines the methods used for implementation (of a scope)
– Architectural components
– Data formats
– Governance structures
– Etc.
• Can be “overridden” for sub-items – if permitted
– Strategies have to be aligned
• Usually engineered at design time
• Larger problem solutions through the assembly of smaller ones
• Publish/Subscribe
• For example:
– publish_scope(id, prefix, strategy) publish_info (id, prefix, strategy)
– unpublish_scope(id, prefix, strategy) unpublish_info (id, prefix, strategy)
– subscribe_scope(id, prefix, strategy) subscribe_info (id, prefix, strategy)
– unsubscribe_scope(id, prefix, strategy) unsubscribe_info (id, prefix, strategy)
– publish_data(id, strategy, data, data_len)
– getEvent(&event)
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• Click is an external framework that Blackadder uses
Background Information:
• Open source platform for building packet processing configurations that consist of connected elements
– Language for describing router configurations
–
Ready-made elements
– Libraries for creating new elements as C++ classes
•
Portable code
– Kernel and userlevel
–
Linux, FreeBSD, Mac OS X, etc.
• Modular design approach
– Reuse of elements in different configurations
(e.g., in different prototypes or experiments)
• Basic operation: packets are pushed or pulled between elements
Click Router Configuration
FromDevice@1 c
Classifier
• Example: Ping
(nothing to do with Blackadder, just illustrates a Click router)
CheckIPHeader@3 ip
IPClassifier ping
ICMPPingSource
SetIPAddress@6 define($DEV eth0, $DADDR 8.8.8.8, $GW $DEV:gw)
FromDevice($DEV, SNIFFER false)
-> c :: Classifier(12/0800, 12/0806 20/0002)
-> CheckIPHeader(14)
-> ip :: IPClassifier(icmp echo-reply)
-> ping :: ICMPPingSource($DEV, $DADDR)
-> SetIPAddress($GW) arpq[1] c[1]
-> arpq :: ARPQuerier($DEV)
-> IPPrint
-> q :: Queue
-> ToDevice($DEV);
-> q;
-> [1] arpq; arpq
ARPQuerier
IPPrint@8 q
Queue
ToDevice@10
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• Implements a Netlink socket for receiving pub/sub requests from applications (or an API library) and for sending back pub/sub events and published data
– These are sent as messages through the socket
– In user space, the IPC element utilizes the selection mechanism provided by Click
– In kernel space, the element receives sk_buffs in the context of the running process – buffers are wrapped into Click packets that are later processed by a Click task
• Everything is asynchronous – like an event-based system
• publish_scope(id, prefix, strategy) publish_info (id, prefix, strategy)
• unpublish_scope(id, prefix, strategy) unpublish_info (id, prefix, strategy)
• subscribe_scope(id, prefix, strategy) subscribe_info (id, prefix, strategy)
• unsubscribe_scope(id, prefix, strategy) unsubscribe_info (id, prefix, strategy)
Variable length 1 1 Variable length 1
ID Prefix ID length
• publish_data(id, strategy, data, data_len)
1 1 Variable length
ID
1 LID size
LIPSIN Identifier
(These messages are only used node-internally)
1 LID size
LIPSIN Identifier
Data
• Start Publishing, Stop Publishing
• New Scope, Deleted Scope
1 1 Variable length
ID
• Published Data
1 1 Variable length
ID Data
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• Standard Click elements for network communication
– ToDevice and FromDevice for directly sending and receiving Ethernet frames
• Suitable, e.g., when experimenting over high-speed LANs
– RawSocket for sending and receiving IP (UDP) packets over raw sockets
• Suitable, e.g., when experimenting in the PlanetLab testbed or VPNs
• IP network used as an underlay
LID size
LIPSIN Identifier
1 1
ID
1
1
ID
2
1
ID n
Payload
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• Receives packets from the network communication elements
– Matches the FID with all outgoing links and forwards the packets
– A separate LID is assigned to the “internal link” between the Forwarding element and the Local
Proxy Element
• Implements the notion of destination
• Click configurations – can be auto-generated
Forwarder (MAC, 1,
1, 08:00:00:00:00:01, 08:00:00:00:00:11, 1000000000000000000000000000000000000000000000000000000000000000
1, 08:00:00:00:00:02, 08:00:00:00:00:12, 1000001000000000000000000000000000000000000000000000000000000000
2, 08:00:00:00:00:03, 08:00:00:00:00:13, 1000001000000000001000000000000000000000000000000000000000000000
); fw[1] -> Queue(1000) -> ToDevice(eth0); fw[2] -> Queue(1000) -> ToDevice(eth1);
FromDevice(eth0, SNIFFER false) -> Classifier(12/080a)[0] -> [1]fw;
FromDevice(eth1, SNIFFER false) -> Classifier(12/080a)[0] -> [2]fw;
Forwarder (IP, 1,
1, 192.168.0.1, 192.168.0.2, 1000000000000000000000000000000000000000000000000000000000000000
1, 192.168.0.1, 192.168.0.6, 1000001000000000000000000000000000000000000000000000000000000000
2, 192.168.1.1, 192.168.1.2, 1000001000000000001000000000000000000000000000000000000000000000
); fw[1] -> Queue(1000) -> RawSocket(UDP) -> IPClassifier(dst udp port 9999)[0] -> [1]fw; fw[2] -> Queue(1000) -> RawSocket(UDP) -> IPClassifier(dst udp port 9999)[0] -> [2]fw;
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• “The heart of a network node” – everything goes through it
• Receives all pub/sub requests from applications and other Click elements
• Keeps track of
– Pending subscriptions
– Advertised information items (and assigns FIDs)
• Receives
– Published data and notifications about new or deleted scopes
• Pushes packets to subscribers (applications or Click elements)
– Notifications to start or stop publishing data
• Pushes packets to one (of the potentially many) publishers
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• The same element runs in all nodes
• Every node can create an information structure that will be known and maintained by the local RV function
• Other nodes can send pub/sub requests to that node if they know a path to it
• Usual scenarios
– A network node (its RV function) maintains a local structure for IPC (node-local strategy)
– A network node (its RV function) maintains a structure accessible by physical neighbours (link-local strategy)
– One or more dedicated RV nodes run in a domain – end hosts know how to reach them (domain-local scenario)
• The RV Element access the world the same way applications do
• It subscribes to root scope FFFF where all pub/sub requests are published
• It publishes Topology Formation requests to scope
FFFE to which the TM has subscribed
• Topology formation is required when:
– A set of publishers need to be notified with
Forwarding IDs that point to a set of subscribers
– A set of subscribers need to be notified about a new or deleted scope
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• An application
– Calculates shortest paths in a network
Forwarding information
– Uses (e.g.) the igraph library for this
• How the TM does IPC
– Subscribes locally to scope FFFE
– Receives requests from the RV node as publications
– Publishes responses directly to publishers and subscribers using the Information ID
/FFFD/destinationNodeID
– Utilizes an implicit rendezvous dissemination strategy where information is published with a specific FID
App1 App2 App3 App4 ………………...
AppN
IPC Element
Rendezvous
Local Proxy Topology
Manager
Forwarding
Communication Elements
/dev/eth0 /dev/eth1 Raw IP Sockets
• Currently 5 strategies are implemented
– These strategies are used for choosing the scope of information visibility in a network
1. Node-local
– IPC
2. Link-local
– A node can create information graphs a) locally – accessible to physical neighbours b) remotely – accessible to this node
– Link IDs are provided by applications
3. Intra-domain
– End-hosts use an FID to a dedicated RV to create information graphs and to subscribe to scopes and information items
– Publishers assign FIDs (to subscribers) to individual information items
4. Subscribe locally
– Do not send anything to any RV
5. Implicit rendezvous
– Publish the data immediately using the provided FID
• All network nodes run the same software
– Blackadder runs in user space or kernel space in the nodes
• Configurations can be different
– End-nodes are configured to have link access (LID) and access to dedicated rendezvous (RV) nodes (with an FID)
– Dedicated forwarding nodes run only the forwarding element
• And other elements if additional functionality is required
(e.g. caching)
– Dedicated RV and TM nodes
• Any nodes can be RV nodes – an FID is required to reach them
• TM nodes run a Topology Manager (TM) application
– A deployment tool can be used for generating configuration files and deploying them in a network
– Network attachment component for dynamic settings
Publisher ba = Blackadder(True) ba.publish_scope(sid,
“”, DOMAIN_LOCAL,
None) ba.publish_info(rid, sid, DOMAIN_LOCAL,
None) ev = Event(); ev.type = 0 while ev.type != START_PUBLISH: ba.getEvent(ev) pass while True: data = raw_input() ba.publish_data(sid+rid, DOMAIN_LOCAL,
None, data, len(data))
Subscriber ba = Blackadder(True) ba.subscribe_info(rid, sid, DOMAIN_LOCAL,
None) ev = Event() while True: ba.getEvent(ev) if ev.type == PUBLISHED_DATA: print ev.data[:ev.data_len]
(This example uses a Python API that is wrapped on top of a C++ API library that translates API calls to messages that are passed through IPC sockets.)
• Open source (GPLv2 / BSD)
• Code, documentation, etc.
• http://www.fp7-pursuit.eu/
• https://github.com/georgeparisis/blackadder
• Current release: v0.2beta (in GitHub)
• Pub/Sub prototype that implements the core ideas from PSIRP
• Blackboard-based architecture
• Integrated with the OS kernel
– E.g., virtual memory management
• Objectives: efficiency, natural interface, object deduplication, etc.
• Works in FreeBSD
• A publication is an object in the blackboard – i.e., in the computer ’ s memory
– A (concept) publication is identified by a RId
– A version is a specific piece of data identified by a vRId
• version-RId: hash tree root
– A page is a block of data identified by a pRId
• page-RId: hash of content
• Sub-object relationships
– Concept publications can have several different versions
– Versions have a specific set of pages in a specific order
• Scopes are special publications that are identified by SIds and store collections of RIds
• Publication
–
–
A piece of content
Related metadata
• Identifiers, size, type, …
• Objects have their own identifiers
– E.g. 256 bits; an opaque or a hierarchical structure
– Could be tied to the data and/or an entity
– Single global identifier space assumed (by default)
• Scope
– Collection of data publications (their IDs)
– Information aggregation, access control
• Data
– Placeholder for a
” concept ” , i.e., mutable content
• Version
– Immutable instance of a data publication
• Page
– A chunk of actual data
(e.g. in the OS kernel or in network packets)
– E.g., 4096 bytes
Root Scope
Subscopes
Publications
Versions
Pages
Scope 0
Scope 1 Scope 2
Pub 2 Pub 3 Pub 1 Pub 4
Version 1 Version 2 Version 3 Version 4 Version 5
Page 1
Page 2
Page 3
Page 4
...
Page 5
Page 6
Page 7
Page 8
Page 9
Page 10
Page 11
...
Page 12
• Create
– Create a piece of content to be published
– I.e., allocate virtual memory objects for
data and metadata
• Publish
– Make content available to others
– Results in a new version
• Subscribe
– Request and get content
• Register, Listen
– Get notified about publication events
(e.g., when a new version appears)
pointers to data and metadata of a memory object
identifies a scope
identifies a publication
Userlevel Click
Kernel interface socket
Userlevel interface fs
RZV client
Forwarder
Network devices
Kernel-level Click
RZV if
Blackboard
Data publisher
… pub/sub API
Data subscriber
TM
RZV node
…
File system, kevents
File system
Kernel events
Pub/Sub applications
Pub/Sub API library
System call interface
Blackboard *
Internal data structures
Kernel-level interface
Virtual memory system
• Motivation:
We want to achieve efficiency, a natural interface and object deduplication
• Existing FreeBSD VM system data structures utilized:
– vm_page_t
– vm_object_t
– vm_map_t, vm_map_entry_t
– ...
• In our system, for each publication, we have a VM object for metadata and data
• Metadata object
– One page (currently)
– Object ’ s own ID, its size, etc.
– List of sub-object IDs
• Pub: versions
• Version: pages
• Data object
– Pages contain actual content
• Metadata and data objects mapped to applications ’ memory spaces (when created or subscribed to)
• Data is copy-on-write
– Can be modified
• results in a new shadow object
• unmodified pages shared – don’t need to be copied
– Re-publishing results in a new version that can be subscribed to
1 ...
...
2 ...
...
• Each publication has a corresponding vnode in the kernel
• Applications get an open file descriptor in the “ handle ”
– After publish or in subscribe
• Enables the use of kevents
– We use it to get notifications when somebody publishes (or subscribes to) something
• A new file system type, psfs
• File system view to the blackboard
– E.g.: /pubsub/sid/rid/vrid/prid/data
– Data/metadata can be accessed on different levels in the object hierarchy
– In theory, we can also map file system ops to pub/sub ops
/pubsub
/sid1
/sid2
/rid1 data meta
/vrid1
...
• Could be used for enabling demand paging over the network as well
– Together with a pull-based caching-enabled transport protocol
• Publication Index ( pubi )
– Each scope, data publication and version (and page) has this small additional data structure for auxiliary in-kernel metadata
– Holds pointers to metadata and data VM objects and a vnode, filesystem-related information, etc.
• Publication Index Table (PIT)
– UMA zone-based storage
– Hash table with ID → pubi mappings
– All identifers are accessible on the same hierarchical level
– Used for (recursive) object lookups in the blackboard
• ID → pubi → metadata and/or data → sub-obj. ID → …
.
.
.
PIT pages
.
.
.
ID → pubi entry
Pointer to metadata
ID, size, sub-object count, etc.
Sub-object IDs
Publication
Index (pubi)
Metadata
Page
Publication type Identifier Metadata Data
Scope SID VRIDs RIDs in scope
Concept (Data) RID
Version VRID
Page PRID -
VRIDs
PRIDs
Newest version
Immutable data
Immutable page
P R S
RC
Publish
Subscribe
RC
Subscriber set update
(MD SUB)
RC
Version metadata
RN Rendezvous
DP Data
Data subscription
DS
RC
DS
• Native C API: the libpsirp pub/sub library
• Wrappers for Python and Ruby
– Generated with SWIG and additional C and
Python/Ruby code
– The API for Python is object-oriented
• Header
– #include libpsirp.h
• Types
– Identifiers: psirp_id_t (array)
– Handle: psirp_pub_t (pointer)
• Primitives
– psirp_create(), psirp_subscribe(), psirp_subscribe_sync(), psirp_publish(), psirp_free()
• Accessors
– for data, length, identifiers, fd , …
– psirp_pub_data(psirp_pub_t pub), psirp_pub_data_len(psirp_pub_t pub), …
• Events
– psirp_kq_t
– or standard kqueue() and kevent() calls
#include <libpsirp.h> void pubsub(psirp_id_t *sid, psirp_id_t *rid) { psirp_pub_t pub; psirp_subscribe(sid, rid, &pub, 0x0) != 0); uint8_t data = psirp_pub_data(pub); data[0] = ’a’; data[1] = ’b’; psirp_publish(sid, rid, pub, 0x0);
} psirp_free(pub);
• Open source (GPLv2 / BSD)
• Documentation, source code, VM images, etc.
• http://www.psirp.org
• http://code.psirp.org
• http://users.piuha.net/blackhawk/
• Current release: v0.3 – in this presentation we described a more developed version
• Two information-centric pub/sub prototypes
• Different approaches
– Channel vs Document
– Not presented: Algorithmic IDs
• Blackadder implements PURSUIT’s functional model
• Blackhawk implements PSIRP’s memory object model
• Similar APIs, similar architectural components
– Ongoing work: Integration