Judith Bishop: So good morning, everybody. I'm Judith Bishop, and I'm chairing this session on Cloud Futures. We're delighted to introduce here Antonio Cisternino and and he's going to be giving a paper for himself and his colleague, Mark O'Meara [phonetic] on managing cloud infrastructure. They come from the University of Pisa, and welcome to them. [applause] Antonio Cisternino: Thank you. So today I'm going to give you a brief introduction to a system we're working on in Pisa about -- it's called Octopus. It is about mostly managing virtual machines. So it's not really on cloud but more on how to manage the huge datacenter we heard about in the last presentation today and also yesterday. So virtualization is quickly becoming a key element in computing infrastructure because it allows -- for many reasons. One of them is you can better use your router if you have one server, you can partition your services into different systems. You can achieve better usage of your CPU cycles, and so you're going to be more efficient with respect to energy. So Octopus has been from a collaboration with [inaudible] raising the vision and the idea is how to manage a cluster of virtual machines in an efficient way. So what is about cloud? Cloud is about data. Data is a really important aspect. There is services. You run services on cloud. It is about heterogenous systems, especially if you run private clouds, because it should be expected to have heterogenous hardware over time. And reliability, which is important, because if you buy services from a service provider, you must be reliable. It's about computing center. So many of these things are suitable for virtualization. And as Dan Reed told us this morning, virtualization is often available. So we have very powerful computer systems that gives us more horsepower than often is needed, so that's fine. And let's start with a demo. I know that this is pretty much unusual, but it's difficult also to define what Octopus is. So Octopus is something like this. So this is a cluster on our IT system [inaudible] in Pisa, and it is an HP blade system with eight blades, so 32 cores. And so we're able to run up to 32 virtual machines on top of that. Each node has 8 gig of RAM. It's a fairly variable cluster. So I log onto the system, let's say, using my account -- oops. And what I get is my virtual machines. So the idea is that user get access to their own virtual machine and they're able to manage them from the web. So I can go here and say new. I want Linux box with one core and one gig of RAM called hello, and the machine name is [inaudible]. And the password a querty. I'm going to remove this. So what's going on? Now we are running on the distributed network of hypervisors running Microsoft Hyper-V Server, and we are able to interact with, programatically, with the Hyper-V infrastructure and allocate virtual machine dynamically on nodes. So here I was having to already create a virtual machine, and here there is a third one, which is the one I've just created. So under the hood here the operating system is booting. At the end of the boot the image is configured so that it integrates with the Octopus database management. So actually it gets into all the data that you need for managing the virtual machine. And so every user in this way has access to his virtual machine and can suspend a virtual machine like I'm doing with this Win 7 box, and I can stop, so unplug the power, and I can migrate the virtual machine. So, actually, this migration is more a test, because one of the features of the system is that virtual machines are running on a cloud because you don't know really where the virtual machine is running. So actually when I click this bottom, which is used for us for testing, the virtual machine gets moved lively onto another node. And this is very important for several reasons, mostly management reasons, because it allows system admins to move virtual machine around and have access to the physical hardware without interrupting the services. Moreover, you can also pack computations into single nodes, shut down unused nodes. So you can also be green, more greener in your power management. As you can see here, we were able to get this screen shot of the -- okay, something -this is the small one. This is the screen shot of a running virtual machine. So actually these three dots are the log-in of the [inaudible] server as it is here for this one. And this virtual machine now is up and running, so you can access. This is the screen shot. And if I go back, refresh [inaudible] machines, I can obtain the IP address of my newly created virtual machine and I can log into it. So now I can run as my user name, [inaudible], and now I can querty. Okay, querty. There we go. So this is a brand new virtual machine that has been created on the fly, and it's fairly useful, because if you have to run a lot of computation, you can install images easily with this stuff. And at the end of the day, you can decide to turn it off -- oh, yes, I want -and delete the virtual machine, so recover all the computing resources needed from the virtual machine. So I'm going to unplug and turn off the machine and delete it. Okay. So this is what Octopus is about. So the nice thing is that actually we've been able to -all the interface you saw that has been inspired by Windows [inaudible] 7 series that is about to be released is working on standard browser, so you can have also access and monitor your virtual machine from standard server form and you have access to all the information you need. Okay. So what is the architecture of the system? So the system has a number of storages here that can be single PCs or whatever. You can have many more of them, not just one. So usually it's up to you to decide your own architecture. You can use spare desktops or whatever, and then you have computing power which is provided by, I don't know, 1-unit arrays, desktops, or better infrastructures, blades. And the Octopus simply is a software that coordinates resources. So actually the older software we are using, it production quality. I mean, it's Hyper-V -- Microsoft Hyper-V 2008 Server, and we've been able to use all the services you get from it. Okay. So the software. The infrastructure has been realized using Windows Server Hyper-Visor 2008, Windows Active Directory, and the Octopus services that we have developed. And as for the guest operating systems, we have integrated with a system, which means mostly you have to do a last step at the end of the configuration of a system to integrate with the Octopus server, gets your IP and everything, and we've been able to run Linux, which is supported by Hyper-V, Win7, which has been really [inaudible] because we're starting to run our administration offices on virtualized box, and Windows Server 2008 and we've recently upgraded to Windows HPC services, so we've been able to use the Windows HPC [inaudible] to share computation on virtual machine using this. So what is the structure of the system? So the system is mostly built on top of standard interfaces, so these are robust and everything. So the WMI Hyper-V calls, which are available, shared storage interfaces, and DHCP networking and DNS. So actually if you have the Windows DNS you can use calls to configure nodes on the fly and have publicly available virtual machines. We are also using the tunnelling facilities to tunnel remote desktops and SSH facilities into a private network. But, anyway, most of this is [inaudible] sites there or the configuration. Then there is this Hyper-F library which is available on CodePlex which has been a port of the [inaudible] calls on top of F Sharp. So actually everything has been implemented using the F Sharp language. And using this hyper-F you're able to migrate lively. You can even do hard checkpoints running virtual machine. So if you have really long-term jobs running, you can hard checkpoint them with [inaudible], which is around one minute. And then on top of that there is the Octopus system, which is an interface that you've seen that is web-based and tries to -- conveys the idea that managed the resources should be easy, not with a [inaudible] that assigns nodes manually and everything. So we're ready to try to do better. So we relied upon a really nice feature of Hyper-V, which is differencing discs. So using this approach, we've been able to pack into a single image many, many images because you can put this disc on top of another disc, sort of copy and write. So the newly virtual machines create requires only the few bits for the operating system to start and create all the paging file and small differences on the base image that stays the same for all. So actually Windows Server 2008 instance costs us around 400 megabytes. And since the differencing discs can be layered so you can build a differencing disc on top of another differencing disc, we are were able also to do snapshotting. So actually if you have a virtual machine and you want to try something on that machine, you can do a snapshot, have a new instance which is the photography of the current running machine, do whatever you want, and backtrack if you're not satisfied with the changes done. So this is a really old refrain that comes from the list community mostly, which says that memory management is too important to be left to programmers versus memory management is too important to be left to the system. So C programmers were against system automatic memory management, as least programmers were against human memory management. At the end of the day, we can say that mostly now we have most garbage collecting, so automatic memory management. And this is the same, I think, for virtual machines management, system management, because as the number grows, humans tend to be fault prone when they manage large numbers of objects. So we need system that takes care and schedules resources for us. So management is important. And actually Octopus features three interfaces. So the web one you just saw, and Sysadms, which will be web interface, and the last one -and F Sharp, sorry, because since everything has been implemented in F Sharp, you can use. F Sharp interactive, which allows you to interactively call. F Sharp functions to manage your virtual machines in a sort of shell. And then you have programs using F Sharp. So actually you can manage and schedule your virtual machine as much as you like. It's up to you to decide the best way to do it. So we're still working on Octopus. And the next step -- so actually we built the infrastructure, but the next step will be implement a policy manager. So actually we're looking for policies to manage your virtual machines such as I want that user can instantiate at most ten virtual machines that runs at most 15 days and after 15 days they will be automatically erased, so deleted, and you can recover, or whenever a non-server virtual machine goes idle, suspend virtual machine so you can recover computing cycles. So these are virtual machine status, user based policies and SysAdmin based policies are the inputs, and the action you can take are you can shut down machines, delete machines, move machines across nodes. For instance, for tracking computations and be more power efficient. So the scheduler will be the most critical part of this, and we are attempting to implement an intelligence shell. We're trying to implement non-trivial policies and move machines around, not just for fun but because machines are really expensive. So we recently performed some green computing computation. We did a nice job in measuring how much energy power requires a [inaudible] for running. And during this experiment we run into a pretty interesting fact. So actually these are graphs that shows you that the average power absorption -- so these are watts, this is the input of an algorithm -- changes if you are using one, two, three or four cores of a processor. So actually it turns out that the waste of energy power if you use just one core on a 4-core machine, it's up to 10 percent. So actually it's very expensive having virtual machines -- having standard machines idling. Until we wait that this wonderful power programming take over, we still have to deal with a lot of single sequential programming, and virtual machine are a viable way to have many of them running on a quad-core machine, up to four, and the benefit you're getting is that you're using better your power resources. Okay. So Octopus is a virtual machine scheduler mostly, which is already -- sorry, go back -- which is mostly product quality base it's just a scheduler. So virtual machines are accessible through the standard marks of interfaces so you can start using, and then in the worst case you have old virtual machines and you simply give up in trying to use the system. And it mostly eases the management of cloud computing resources, because actually since we got this system up and running, the number of virtual machines we created has been incredible. I mean, I never dreamt to do it manually, but actually when I do a development, I have preconfigured the image with Virtual Studio 2010 and then I create a differencing disc on top of that, do my development and turn it off and on and create a freshly new installed machine whenever I need. And we believe that it may contribute to achieve better usage of computation resources. So with this, I'm done. And the system is actually open source on octopus.codeplex.com if you're interested. Judith Bishop: So we have time for a couple of questions. Questions? Yes? >>: Excuse me. How many virtual machines can you [inaudible]? Antonio Cisternino: How many? >>: How many virtual machines? Antonio Cisternino: There is no upper bound because actually we use [inaudible] code that allows you to manage remotely a Windows Server hypervisor, so actually you can instantiate as many horse machine as you want, configure then onto the scheduler and the scheduler simply do remote code. So it's up to you to [inaudible] the system, but there are no implicit bounds to the number. >>: [inaudible]. Antonio Cisternino: Sorry? >>: [inaudible]. Antonio Cisternino: No. >>: [inaudible]. Antonio Cisternino: Yeah, I mean, we look at system like those. So actually the original goal of Octopus was to build the experimentation setup to do as much scheduling for power-consumption reasons. So we were interested in laying ground the infrastructure that was programmable. >>: [inaudible]. Antonio Cisternino: Yeah. So the idea is that actually with few F Sharp code you can say for each virtual machine, do something, and so on. >>: [inaudible]. Antonio Cisternino: Not in this way. We have more access to underlying systems. Thank you. [applause]. Judith Bishop: Okay. So we're going to continue this session and we're still in Italy, just down the road from Pisa, we're going to the University of Bologna and here we have Fabio Panzieri who's going to tell us about quality of service aware clouds. Thank you. Fabio Panzieri: Thank you. Good morning. I'm going to report on an exercise, an experimental exercise. We're coming out in my department with a couple of colleagues of mine, actually, two former students, both of them, of mine, on adding quality of service tools and software within cloud computing environments. This talk is organized this way. I will firstly motivate why we're doing this and then I'll try to clarify what we mean by quality of service in cloud computing and in particular what is the role of the service level agreements in this context and what is the earlier work on which we base our current research. Then I will illustrate this architecture we've proposed and I will eventually will tell you about this experimental evaluation results we have come up with and which is what I'm mostly interested in discussing with you. And, finally, I will conclude this talk with highlighting some of the future developments that we think are relevant in this particular context. We are all familiar with the notion of cloud computing, particularly after the second day of this meeting, and so I will not indulge on discussing whatever software as a service or platform as a service or infrastructure as a service notions. What I think is relevant to say is that as cloud computing is essentially what is summarized in this line by Ian Foster [inaudible] that service is delivered on demand to external customers over the internet, I think it's worth stressing that quality of service is becoming a crucial factor for the success of cloud computing providers. If the cloud computing environment does not deliver the expected quality of service, then the reputation of the quality of service -- sorry, of the cloud computing provider can be tarnished, and then it can be, of course, financial losses. The motivation essentially is that. By quality of service in cloud computing environment, we mean compliance to the service level agreements that an application using a platform concepted out of cloud computing resources obtains from that particular infrastructure. So in this context we usually talk about response time, throughput error rate and parameters such as this, but there are no financial requirements additionally that can be considered in assessing quality of service. And this includes scalability or availability. As far as this particular exercise is concerned, we addressed mostly response time as the quality of service guarantee we wanted to evaluate. However, what we mentioned again is that to the best of my knowledge, quality of service in cloud computing is not yet sufficiently investigated, although we have observed sort of growing interest in both industrial research communities and academic research communities on this particular issue of [inaudible] provision. So we come from the distributed system community, and the basis of our work relates to a project which recently terminated which had to do with providing quality of service support in a distributed computing environment constructed out of clustered application servers. Essentially we tried to reuse the approach we had in that project within this new context of cloud computing. In addition, we are looking very carefully at the results that a currently funded project by the European community called Reservoir that this project is producing. I shall briefly summarize these two projects just to set you in the context. The TAPAS objective was that of developing a family of middleware services that could make Java 2 Enterprise Edition technology cross aware, that is, capable of meeting SLA requirements, service level agreements. So to this end, what we did was essentially to extend one particular implementation of Java 2 Enterprise Edition, which is called JBoss, which is in open source, where the three principal additional services that are configuration service and monitoring service and a load-balancing service are incorporated in this platform, and essentially they manage the platform dynamically. As the load on the platform -- on the application hosted on the platform augments, the configuration service enters in action and reconfigures the platform to cope with the augmented load. And, in contrast, when the load diminishes, then resources added to the platform are released. So the effort of this particular architecture is to optimize the use of resources in a distributed computing environment, the same sort of principle which wish to apply to the cloud computing. And in particular, we do that -- well, I'm sure you know this -- we do that by adding those services to the architecture that is being proposed within the context of the Reservoir project. Reservoir is a project which is led by IBM and looks at providing support for cloud federations. In addition, one of the further aims of this project is providing interoperability and business service management. The architecture proposed which Reservoir is that of a system structure in three hierarchical levels of abstraction, a top level known as a service manager which is responsible for deploying the application on the basis of what they call a service manifest, which is a new version of a service level agreement, and the service manager is implemented a top of what they call a virtual execution environment manager which is responsible for coordinating the distributed virtual environment -- virtual execution environment hosts that are basically the various resources deployed on the single nodes in a distributed system, in a distributed environment. At the moment, as far as I know, only the lowest level and the virtual execution environment manager have been implemented. The project is still ongoing. I think it will terminate in a couple of years. At least it's due to terminate in a couple of years. Our approach is to extend that architecture to include -- incorporate in the service manager those services that we developed for the TAPAS project that is this middleware that looks configuring the platform, measuring the -- maintaining, controlling the compliance with the service level agreement established by the application hosted on the platform and the monitoring of the current execution of the application. This architecture here has been -- we wanted to evaluate this architecture, basically, and see whether we've got it right or wrong, and so we examined the architecture, our architecture, in a scenario which is made out of a pool -- I'm looking for a pointer, but I can't find it. No, there isn't. We're assessing the architecture in the scenario in which we have a pool of available, I'm assuming, free, not used, virtual machines that can be instantiated and executed on demand. Each virtual machine comes up with a fixed quantity of resources -- CPU, RAM, storage -- and can execute scalable services on a pay-as-you-go accounting principle site. In our exercise, what we wanted to do is basically assess the cost of allocating resources, allocating virtual machines, basically in order to get some indication as to what would be the better configuration and load-distribution policies we could deploy in a context such as this one. And in particular, we would like to devise dynamic configuration policies that do not violate the service level agreement. Evaluating our architecture in this scenario has turn the out to be quite difficult. On one end because of the actual complexity of implementing our architecture, which requires some time and some investment that we didn't have available at the moment which we went into this exercise, and in addition, we didn't have available an infrastructure -- a cloud computing infrastructure that we could use for our purposes. So we evaluated our architecture through a simulation exercise. And we implemented a principal service system. Using a request generator and a response generator, we obtained some performance results that we'll show you in a moment. So basically it is, as I said, an initial exercise that has given -- has provided us with these results. With assume that an application is deployed on the platform using some service level agreement that specifies its own quality of service requirements, and the agreement, the negotiation, that occurs in practice is that the service level agreement is usually agreed that it can be violated for a certain percentage of time. So we assume that that percentage is five percent. So what is called SLA officially equaled 95 percent means that it is acceptable that the SLA be violated for five percent of the time. We also assumed that the allocation time of a virtual machine is two seconds. This is a very short allocation time. It is reported in the literature, there is this paper by Sotomayor [inaudible] who report that it can reach even 400 seconds, the allocation time. But as we were not currently interested in designing policies for bootstrapping virtual machine machines quickly, rather we wanted to see whether we could stick to a SLA as far as the response time of the application goes. We assumed that the virtual machines essentially did not stand by, and they can be configured if the platform on the fly. So this is the first set of results that we obtained. We have a number of nodes, which is very limited. It goes from 1 to 3 only in this first exercise. We set the response time to 200 milliseconds. As you can see, the green line indicates the threshold that when it's reached, the -- when the response time goes over that threshold more than five percent of the time, then the platform is reconfigured. And the threshold is below the actual response time negotiating the SLA because we maintain a margin just to guarantee that the SLA is not violated at all. Usually violating a SLA means penalties for the -economic penalties for the provider. So in order to make sure that the SLA is not violated and at the same time in order to prevent the provider from doing over provision of resources, just to stick to the SLA, we use this threshold which is just a below the actual negotiated response time. As you can see, we used the load growing up to 90 requests per second with a SLA limit of 100 requests per second, and the blue line that you see on this slide indicates when the virtual machines are allocated and released according to the load that is occurring that's put on the application. This is on the left-hand side graph. On this right-hand side diagram you see the violation right. And under these particular circumstances you can see that the five percent SLA efficiency limit is maintained. It's always below the five percent. The violation rate is always below the five percent, which is admitted. Then we thought that the number of virtual machines that we were allocating is really very small, so we tried -- we augmented the number of machines available up to 13 and maintained the same requirements as before, that is, the violation rate of five percent and -- below the five percent and response time below the 200 milliseconds. Even in this case you can see that the violation rate never goes above the five percent, which is as expected, and the resources are added to the platform and released when necessary as the load varies. Out of this exercise we thought that -- these are just very initial results, as I said -- we felt that they are -- they appear to us quite encouraging. That simply says that design approach appears to be adequate. However, there are a number of problems that remain open that I think are interesting to investigate, and I'm quite happy I found in this workshop sort of confirmation that -some of the things that you see in these slides were actually written before yesterday and I found out yesterday were confirmed by the presentations that I heard. So one thing I wish to point out is that if we go for a dynamic configuration approach, then when we manage a large number -- if we're going to manage a large number of virtual machines, we may hit a scalability problem. Not necessarily in the platform, but in collateral subsystems. If the virtual machines share a database, the database itself might become a bottleneck if you have a large number of these machines. Of course, one can replicate the database and distribute the load across the database, but then you hit the problem maintaining coherence amongst the various replicas. So I think this is a problem that deserves attention and further investigation. In addition, this observation that we made and I guess we shared, the virtual machine allocation time can be very high and may lead to SLA violations. So this goes to say simply that it's necessary to investigate virtual machine management and allocation policies that can prevent from violating possible service level agreements. So we are planning to do further testing of our architecture using a real cloud as a test bed. One of the candidates is Open Nebula, which is a cloud in open source, but we hope that maybe after this workshop even Microsoft Azure might become available for some experiments. One of the things we would like to look at is extending the range of quality of service requirements we wish to consider, and in particular we would like to address dependability requirements such as fault tolerance or security. I think that an analytical modeling of cloud computation environments and consequently of our architecture will be extremely useful to understand where and to invest in order to ameliorate and to make better progress in the quality of service aware cloud computing environments. And we would like to look at what issues of cloud federations, and in that particular context, we think that will become particularly relevant issues of trust, trust management, and as was pointed out yesterday as well in one of the invited keynote presentations, we also think that the integration of cloud computing with mobile devices and services is one of these very challenging scenarios that is worth investigating. That ends my talk. Judith Bishop: Well, thank you very much for that interesting talk, including a lot of performance figures which I think are important for all of us to know about. Questions? Yes? >>: There scaling up and down your VM applications, do you this for increments of one machine? And do you think it would be useful to do it more aggressively or would other -Fabio Panzieri: What do you mean by more aggressively? >>: For example, if you get really large load spikes and you really need, let's say, to increase the number of machines [inaudible] and a view of the load. Is that something that you're looking into? Fabio Panzieri: Yes. In fact, it is something that we did in the other project as part of the exercise we did before when we implemented the de-certification service. We did augment the number of nodes that were brought into the platform in much larger numbers than that. >>: So if you look at it for [inaudible] offerings, you have a lot of different instance types you can acquire, so are you also looking into those performance modeling or [inaudible]. Fabio Panzieri: This is what we would like to do if we can find the people that can actually work on that. >>: [inaudible]. Fabio Panzieri: I think it's one of the very interesting topics. Judith Bishop: Other questions? Okay. Great. Well, we'll thank our speaker, then, and -[applause] Judith Bishop: We're going on to the next talk, which is a change in the program. The next speaker is actually Sarunas Girgzijauskas from -Sarunas Girdzijauskas: It's a very complicated name, I know. Girdzijauskas. Judith Bishop: There you go -- from Sweden. So if you want to switch around, now, your chance to move. Okay. Welcome to the third speaker in the session on systems and infrastructure. We have Sarunas from the Swedish Institute of the Computer Science in Stockholm, and he's going to talk to us about cognitive publish and subscribe for heterogenous clouds. Over to you. Sarunas Girdzijauskas: Thank you. So my talk will be a little bit different than what we heard before because that's only the paper, current paper, that we're working on in our institute with my Ph.D. student Fatemeh Rahimian, who did most of the work in simulating and experimenting with the system. So I'll be not as broad, but we'll go a bit to the most specific problems of this cognitive publish/subscribe systems which we use for heterogenous clouds. Okay. So what is the future of the clouds? And although we saw these last two days that the future is for Microsoft, Amazon, Google, we have a bit of different thoughts. Maybe part of the future of the clouds will belong to the decentralized architecture. It's no doubt surprising because imagine how many good computing devices we have with us -- laptops, phones, iPads, whatever -- and what's more important, they're getting better and better connected with each other. And that brings new media of possibilities where we could come with our devices and to ask cloud for -- ask this cloud for resources, but also can contribute our idle resources to the cloud. So we think that in the future there will be this network collection of connected devices forming various microclouds which will be extremely heterogenous with very different computational capacities, different bandwidth lengths, different costs between them, and that we'll have to take into account. So one of the main building blocks for such decentralized architecture is a publish/subscribe service, and this is a very broad concept, you can imagine, that the users would like to subscribe to their favorite TV channels for IP TV or maybe scientists would like to subscribe to certain large Hadron Collider data. It's a very, very broad concept, but it has to work. And it has to -- the whole system, if you want to embed it on top of this heterogenous cloud, has to adapt to the topology, adapt to the existing connectivity to use the cheapest path. And not only that, it has to be cognitive and to understand what are the user patterns. Of course, probably people or users who watch TV will have different patterns than scientists who are subscribing for some scientific data. So our focus is on the publish/subscribe system which has a very large number of nodes, very large topics on which you could subscribe, and resides in the heterogenous environments. And we assume our -- basically we propose our solution to these cases where central solutions will not scale. Of course, there are many tradeoffs to consider, and you can say why not simply make an overlay for every topic? If the user wants to subscribe to a certain channel, we make overlay for every user who this is topic. However, in this case we might be not very scalable because with the growing number of interests per user, you will need to have unlimited bandwidth or unlimited neighbor set, and in this case we might end up scaling badly. Other extreme is to simply flood or some kind of flood the events or the data through the system, but then we have a problem that there will be many relay nodes involved which will not be able or will not want to cooperate because remember that the amounts of data grow very fast, and if you have to relay some gigabytes of videos which are not in [inaudible], especially if you might need to pay for your provider or something, you might be considering that. So we have to take into account these issues together with the dissemination delay, how fast you can get the notification when the publisher issues, and, of course, what is the cost to get it. And that's, I guess, very important for the internet providers, because so far most of -- I guess here as well, you pay flat rate for the use of the internet, but the internet providers, they actual costs assessed with each of their networks. So this also has to be taken into account. So we will try to make a cognitive publish/subscribe system which would have a fixed bandwidth, fixed node degree. We will not let it grow indefinitely. However, we would like to scale to any number of nodes or any number of topics and to understand -basically to let the system in decentralized fashion without any global control to discover what is the underlying topology to find the most cheap paths in terms of bandwidth and cost and to minimize the number of these relay nodes using subscription correlation patterns, because it is known that users do tend to have correlated interests. If you look from the bigger perspective, the work that we've done is if we see these heterogenous clouds with this physical network with specific properties, we'll try to build on top of this cognitive overlay, which would be -- would take into account all these properties that we want, and then using the connectivity of this overlay, we would build efficient dissemination structures for each topic in the sense we use only the links of this overlay, and this overlay will be scalable by definition, all our dissemination structures will be scalable as well. So how do we do it? We employ a very nice Gossiping technique for building these overlays, and for those who don't know, Gossip is just a very lightweight rebus and scalable mechanism where peers talk to each other -- talk to only their neighbors in the local vicinity and exchange information about the world. Usually they exchange the view of their neighbors, and then this propagation spreads and the peers can start forming different structures. So with this type of mechanism, we build our overlay. And it's maybe a bit more complicated than that, but in a nutshell, it works like that. A node starts with a simple view of the neighbors, like it has some arbitrary neighbor set, and it meets some other node. They exchange these views, they merge them together, and then this two-view set is ordered at each peer with their own preferences. And these preferences is the most important thing here, because we can use some kind of ranking function. If we want to do it to cluster peers with similar interests, we might -a peer might rank the peers in a way that it would prefer peers with the most similar interests or maybe peers which have the cheapest connection to each other. And if we repeat this -- I forgot to tell -- and once we have it, then every peer cuts this chunk of the new set with the limited, basically, neighbor set, the one that every peer assigns to itself, basically that's how we scale, and we're repeating this process in very short round of steps usually. In logarithm number of steps to the population size, we can converge to very nice structures where we have clusters of these peers, of these nodes, which are similar to each other. And this way when they're similar, we can disseminate any events in these clusters very efficiently and cheaply. So just to illustrate, so say we have such type of network and there are, say, peers interested in the red topic, and if we applied this Gossiping in the background and we have -- as a ranking function, we rank basically the peers with a function that has such a similarity metric where we take, for example, ratio between the intersection and union of the subscription sets between two peers, so in that way if the peers have identical subscriptions, it will have similarity 1; if there's no subscription in common, they will have zero; if they have some interests, it will be something between 0 and one; and if every peer does it and leaves for itself only those neighbors with the highest similarity metric, they will eventually converge to the structure where, for example, the red nodes will form a cluster, the white nodes will form a cluster, maybe there will be some peers in there, so in green node it will form a cluster. It might be that they won't form a cluster because they will only be similar enough, but we do our best with the local knowledge to cluster them. Not only that, we can also take into account the link cost. So, for example, if this link is very expensive or starts to be very expensive, this background mechanism of Gossiping can rewire it and will point the green topic to point to the other peer which is maybe cheaper. We can also, with the same mechanism, cluster peer not only by the link cost to enter similarity but also cluster all of those topics or prefer clustering those topics which have a very high publication rate, they're very popular, because we would like to be more efficient for those topics which are expensive to very popular and those who are not that popular maybe you can deal with it somehow differently. And let's not forget always we keep the degree of the graph or basically the bandwidth for every peer limited. So every peer can decide for itself, and we don't need to grow laterally with the number of topics that we're interested in. Now, there are problems with that. Because we have this limited node degree, it will be inevitable that some clusters will be disjointed. They will be connected through some other peers, but just because of this that we have a limit, they will be disjointed. And if you want to publish events on that, we will have to ask for the relay traffic for some other nodes, but we have to do it very carefully and to involve as few as possible of them. Now, how to do it. So basically at this point most of the similar approaches stop because they either do it -- either they're cluster and then say that, well, you know, because of the correlation we might expect that our bandwidth will not grow very high, but they don't give any guarantees. Others just involve many peers in between. Well, we do it a bit differently. Having this over the graph, which there is a nice clustering phenomenon inside, we embedded into that identifier space, and in that way we make navigation in this graph possible where any peer can find another peer just by grid routing. We usually build a navigable small world network for those who are interested. It's basically the work of Kleinberg, John Kleinberg, who first proposed it. What's even more nice is that we do it with the same mechanism of Gossiping. We don't employ another new technique, we just changed the -- oh, sorry, I'll talk about this a bit later -- but basically for embedding structure, we change just the -- or apply a bit different ranking function for our Gossiping technique. And once we have it, when we know how to navigate from any point to another point, then it's pretty -- well, I'll show, it will be pretty easy to connect these topics and to publish them efficiently. So how do we build this navigable searcher with the Gossiping? We assign, for every peer, a random number, let's say some ID from some identifier space -- it can be one dimensional or several dimensional -- and adapt the ranking function in a way that it not only chooses now the neighbors with a similar interest or the cheap cost, as I showed before, but also prefer, like for one or two links, prefer neighbors which have very similar ID to their own. And when you do it recursively after several rounds, every one is connected to the closest neighbors in terms of the ID and, in a sense, make a ring. And it's not a very nice ring here, it's a bit twisted, but in general it is a ring, and a ring allows us to navigate from any node to another node. So we're for sure approaching the target. There are some other things to consider. We have to -- in order to be more efficient, we can put these small world style fingers, but I won't go into details but just say that with this type of technique we can assure that the navigation in the network will be polylogarithmic in the size of the population. So basically we can build, with this Gossiping technique, efficient navigation structure. And once we have all these, now three types of links together, these friend links, let's say, with you cluster because of similarity the ring links for the navigation and the long range links for the efficiency of reaching the neighbors, then it becomes pretty easy. We just say that let's assign for every topic some rendezvous node, and usually it's easy to have a topic name, and then you can get some ID from this identifier space that we use and grid a route from every cluster to this rendezvous node and will point a route and ask for every node that we traversed to be involved in publishing this topic. We don't create any links. It's the same links which we basically -- the same node degree doesn't change, but just in that way we can connect these components and we basically gain -- we retain this clusterization phenomenon in the system while being able to navigate and connect them. Yes. >>: [inaudible]. Sarunas Girdzijauskas: No, actually it gets through this node. >>: [inaudible]. Sarunas Girdzijauskas: I don't know whether it's a very -- it's just for presentation purposes, but let's say if you're out from No. 3 and you say if No. 3 knows that 5 is a rendezvous point, it would say which is the closest neighbor of mine, 35, 28, 20, or maybe here was I guess some small number, and probably the small number is closest to 5. And when it comes here, then it will say who is closest to 5 and it will say oh, I already have 5 and I come here. And every node does the same. So this 4, for example, also tries to go to 5, and it sees that 8 is the closest, goes here, and reaches the 5. That's the nice thing about navigable structure which you have. Because whenever you impose the graph into the identifier space, your initial connectivity, you can exploit this grid routing, and grid routing is basically minimizing the distance to the target. And you know the target, you can reach it. >>: Sarunas Girdzijauskas: So let's say that topic is disconnected and that topic has some name. So if you -- if you devise a rule that every peer hashes a topic name and you'll get as a hash value some kind of number, let's say in this case 5, and everyone does it, and there is -- everyone can do it independently. So everyone knows independently that for red topic, if you want to find it, you have to go through 5 -- to 5, and then at 5 they all meet. There are some particularities. I don't want to go into more details, because you can have several routes from every cluster in order to make more robust or how we find it. There are details. But in this short talk I cannot -- if you want, we can talk offline, but basically we do account for all of it. So in this case all the topics become connected. And now we can simply flood within this topic any event, and then we are assured that we will involve only the topics that are interested and a few other nodes which it is inevitable to involve in order to connect the joined components. So since this ongoing work, we don't have the full result so far, but if you're interested, I have some graphs, but the first experiments are really promising. We were working with many synthetic data sets which actually required a little bit of work on how to synthesize the user subscription patterns, which method to use. We also used -- we [inaudible] a Twitter data set and used the graph there where which people follow which one, and by that we induced what kind of subscription correlation is embedded there. For churn we use a -- there is a Skype churn data available on the internet. And our experiment showed that in some cases if you compare to the existing approaches like Scribe or Bayeux, those that do have limited, basically, node degree, but they don't take into account the underlying network, underlying sort of similarity of the peers, and we have even up to 10 fold increase of -- reduction of relay traffic. We involve much less number of nodes. And if you think that if this would translate to some gigabytes of data that you have to transfer, it can be a very big savings. So to wrap up, we are working on this large scale pub/sub for heterogenous environments, and the main idea is to make this cognitive overlay where we form the clusters of similar nodes a decentralized fashion using Gossiping, and by doing that, this overlay eventually goes through this -- makes the most efficient paths that exist in the network. So it could be very nice for the internet providers, basically, if they throw this system, and if you can somehow measure the cost for every link, they would converge at the least expensive paths to disseminate the data. And, of course, since this Gossiping is always running in the background, whenever anything changes, if there is an environment change, we can always adapt and make new connectivity. And we showed that it is very fast convergence and it is pretty robust to churning failures. So with that, I finish, and I thank you very much. I'll be happy to take some questions. >>: Well, thank you very much for that most unusual talk. And are there questions? Yes, two there. You first. >>: I'm a little confused about your use of Gossiping. If you look into the routing literature [inaudible] usually Gossiping doesn't currently deliver it for information because [inaudible] how do you deal with that? Sarunas Girdzijauskas: So, actually, that's why we built this structured overlay on top of Gossiping, isn't that sense. With Gossiping, we actually build a structure. And even if you ->>: How do you build a structure if you are not under control? You have a distributed system. Sarunas Girdzijauskas: You mean that there can be peers which maliciously drop the packets? >>: Yes. Sarunas Girdzijauskas: Well, okay, this -- at the moment we don't deal with that. So, I mean, I think -- and this is pretty or orthogonal research which has to deal with -- I mean, there's a lot of, in the peer-to-peer community, research on how to isolate, how to overcome the peers which decide not to conform to the existing rules. So in this respect we assume that the nodes will play according to the rules. If you want to implement into the real life, of course, you would need to take care of all these issues. >>: I have a second question. What's your greedy [phonetic] function? Do you have some greedy routing implemented -Sarunas Girdzijauskas: So once you have the identifier space, when every node has a ->>: Distance [inaudible]. Sarunas Girdzijauskas: Yeah, exactly. So in that case you minimize the distance between the identifiers in a sense. Judith Bishop: There was a question here? >>: Yeah. So [inaudible] is that a static structure or is it a dynamic structure [inaudible]. Sarunas Girdzijauskas: Exactly. If nothing changes in the system, if the costs are the same, if the peers don't change their interests patterns or interests, then eventually will converge and will say the same. But if somebody at any moment in time can say, okay, I'm not interested in this topic anymore, I am interested there, or there's a new connectivity, a provider installs a new fiber and then it's cheaper somewhere, then the structure will change, adapt, and again converge to some state. Of course, in reality, everything changes all the time so you always adapt. That's why it's adaptive. >>: Okay. Well, I think we must stop there because our last speaker is ready and poised.