>> Arkady Retik: Okay. Good morning. My... session. I'll also be chair in the session this...

>> Arkady Retik: Okay. Good morning. My name is Arkady Retik. I'm the chair for this session. I'll also be chair in the session this afternoon. And you'll probably figure out, this session is one of the united theme of this session is education. And the same will be afternoon. So we have three presentation, and we have four more presentation related to education this afternoon. Before we go to this particular session, just a very short announcement. If you register, and especially for those faculty who come who are not Microsoft people, if you registered for this conference within the last week, well, you couldn't register because the registration was closed, more specifically. Please pass your e-mail account to registration because I think we're going to give to everyone, faculty outside, access to Azure for test environment. So do encourage you to register and give your e-mail, otherwise you miss this offer. Now, as far as the presentation, we have three very interesting presentation today. Three of them are about education and about using the cloud to contribute specifically to university and teaching. One is to work predictable cloud computing by Andreas Polze, then will be cloud for University City Campus by Danilo Montesi, and then the last one by Harold Castro on cloud computing project engineering. So the first one is by Professor Andreas Polze. Actually, this presentation is by -- submitted by two people, also Alexander Schmidt here, who will be presenting another paper afternoon. Andreas, who will be presenting his, is professor at -- in operating system in middleware at Hasso-Plattner Institute for software engineering. This is part of University of Potsdam in Germany, and there is also head of postgraduate PhD school on service-oriented engineering there. Andreas has very distinguished record and he did research and teaching in many places, not only in Germany but also in U.S., Carnegie Mellon and Urbana Illinois in other things. He has also written many papers. Very interestingly enough Andreas also took part in several Microsoft project and technologies like Rotor, Phoenix, and one -- last one is Windows Research Kernel. Actually he was very active and was a coauthor of Curriculum Resource Kit that helps introduce operating system teaching into universities. Part of this is also contribution to Windows Research Kernel. It's a source code offering for Microsoft [inaudible] environment. Andreas' group did a lot of research, and I believe part of his presentation experience will be about this. So please welcome Andreas here. So just to let you know, every presentation is 20 minutes. I'll give a sign. And then five minutes we will leave for questions. Thank you. >> Andreas Polze: Okay. Thanks for your introduction, Arkady. And title of presentation today is towards predictable cloud computing. And I want to start with this agenda today. First I want to talk very briefly about Hasso-Plattner Institute and our group there and what we are doing in terms of research, then I'm going to talk about a couple of specific projects. One is called WRK, Windows Research Kernel, which is a Microsoft thing, but we were actually first site outside Microsoft who ever touched it. I briefly mentioned KStruct, which is the Ph.D. topic of Alexander Schmidt, who was from my group here, and I'm going to mention NTrace. Those are tools and approaches that are focused on monitoring and investigating and running operating systems kernel. And we will extend it, this topic a little and talk about research management for service computing, which is kind of the cloud computing of yesterday, okay, [inaudible] and services [inaudible] and then we look into how this can be translated and applied to cloud computing. And we will finally talk about resource management for cloud services. So what is the Hasso-Plattner Institute? It is an institute which is privately funded at University of Potsdam. This is quite unusual for Germany where you usually don't have private institutions. We have a full-blown computer science curriculum which focuses very much on software engineering, and we have this Ph.D. program called service-oriented systems engineering which runs now for five years. Depicted here is Hasso-Plattner, who was one of the cofounders of the SAP, so we're living a little from SAP mining, which is a good thing. So if you look at the pyramid, you see the usual education. What stands out a little is this Ph.D. program. And we have a number of collaborations with other Ph.D. schools in the bigger Berlin area where there are about 12 schools in this area alone. And all these students -- okay. All these students, they focus on -- they focus on service-oriented systems engineering. And if you look here, you see the topics of the different chairs which are all looking at these like inner technologies or system architecture from a software engineering standpoint. And the goal of this research school is to bridge a mountain and basically look at all the different levels of this service stack starting at the operating systems level, going over communication and middleware and security and workflow, going to applications. And also orthogonal to this, looking at tools and approaches for building services in a distributed fashion. So what I basically want to point out is we have quite some background in service computing and research in this area from different viewpoints and angles. If we come back to the operating systems middleware group, then our research agenda looks more like this. So kind of anchor point is the middleware. The chair actually called operating systems and middleware. And from middleware we go down to operating systems. There's a number projects. Windows Research Kernel is one of them, but also other collaborations with HP, looking into VMS, looking at HP/UX, also looking into realtime systems. And from the middleware we go into embedded systems and we go into wide or distributed computing, which will be service computing or cloud computing. And we have done research projects on the European level, like this adaptive services grid used to be a European integration project. But also projects by industry. And you see a number of industry partners here. This is basically my background. Now let's talk a little about the operating systems which become important again. This is actually an interesting thing. If you were talking to a computer science guy like -- or to an entrepreneur, even better, if you -- there used to be a platform before eBay become ubiquitous. There used to be a platform called alando. Alando.de was German auction house. And one of the founders, they -- they got bought by eBay. One of the founders gave a presentation at our school and people asked what do we need to learn in order to be successful in business. And he said Java and XML. And this was the answer until maybe five years or three years ago. Now people understand that because of this multicore and multithreading challenge computers are not getting faster as fast as they used to get faster. People understand that they look back and need to look back into the operating systems, need to look back into computer architecture, need to focus on programming models and so forth. And I think cloud computing is one of the tips of the iceberg where we see new computing models arise. But I also want to propose later on that there's more to come. So programming models, computing models, there's a long road to go. Again, operating systems are becoming important again. If you want to look at Windows, then we have put up a Web site which is actually visible from the Internet talking about the Windows Research Kernel. Also we have done -- developed a couple of tools, like one tool called Pixer [phonetic] which does a hyperlinked annotated version of the Windows sources, which is also on the Internet. You don't see these far -- the URL right-hand top, and there's this process of signing up and basically. At some point you need to send e-mail to Arkady. But then a number of schools are already using this. You can access the Windows sources and navigate and run students' projects and do experimentation in operating system space with one important product maybe. This is just Slingshot showing this tool. Whenever you investigate an operating system, you look at the sources. You use tools like a kernel debugger to understand what's going on, and you want to have additional tools which don't interrupt the timing behavior of the system so much like single step and true debugger. Basically the biggest challenge of getting consistent view of what's going on in the operating system. And this is actually a topic which comes back if you talk about cloud computing, getting consistent view of what's going on in the system is getting more and more important and more and more difficult. And there's more presentations actually on this workshop who look at the topic how to consistently judge about the system. So what is the idea here carrying out experiments on the system, developing tools, developing domain-specific languages to describe certain parts of the system, like data structures, and then automatically generate code which will be like device drivers, which could be inserted in the system on a test to get consistent information out of the tool -- of the system. This is more or less the Ph.D. topic of Alexander Schmidt, and he's going to give a presentation this afternoon where he talks about SkyLab. Some people might remember SkyLab. It was famous like 30 years ago. He's giving more explanation why we choose this topic again. The idea is basically using the cloud to build an environment where you can do experimentation with operating system kernel. So you need to run your own test system, you don't need to run -set up experiments, you just go to the cloud and find a number of canned experiments, put in virtual environments together with tools, together with expected observations, like you may remember from your school basically when you did physics experiments. The next thing is called NTrace. This is different research, also carried out at our group. And as the name says, it is about tracing system behavior. So how can you trace system behavior. Typically you need to insert observation points, insert, modify the code, modify the system on a test. How can you do this without possibly having sources of the system. And here comes a little interesting idea. If you look at recent editions of Microsoft products, then these editions are so-called hotpatchable, which means that code is generated following certain patterns, and the pattern is whenever there is a function based on white space on top of the function, and this white space can be used. And basically all Windows products are instrumented that way, or compiled that way. You need to use the white linker switches, which is hotpatch and functionpadmin, and then you get the white space. And now you detect the start of a function, that's the idea, and you remove couple of instructions here, jump to this white space which is used as a trampoline to do a long jump to your instrumentation code. And then you basically are off and running doing instrumentation here. From your function -- you need to detect where function starts. Function boundary tracing, right, that's the topic, from the beginning of the function you jump to your trampoline. From the trampoline you go to some place where you do your instrumentation, and from the instrumentation you have modified return addresses and so forth that go back to the original function. You have modified return addresses, which means you have a chance to go back into your call proxy once more to trace the function exit as well. And then you're off and the system is running as before. So we have developed this on a binary level for the 32-bit versions of Windows, which is applicable not only to the Windows Research Kernel, but also to products. So you can trace running -- your experimental system but also product retail versions of Windows. And I have [inaudible] consider number of details. Because your computer is today being multicore computer. So it's not too easy to decide when it's safe to modify a certain piece of code. Because the other CPU might well execute it in this moment. Next thing, if you know Windows programming, [inaudible] exception handling. Exception handling is also used inside kernel [inaudible] extensively, which means that you have basically not only the stack but also exception stack. You have to save not only your stack pointer but a little more. So implementation details. But in general that's -- this is a method which allows you to trace system behavior on a Windows system. These are a couple of projects in the operating system space, current and ongoing research, which we later on want to apply to the cloud. But first let's look into resource management as a different question for service computing. So in this user interface we have, which is often called green computing today, we have worked with Software AG. Software AG is building middleware engine and in particular they are building a thing called CentraSite, which is a metadata enriched [inaudible] repository, basically the one place where you want to put all your policy information about how to run certain services, what kind of resources [inaudible] do for the services and so forth. So we developed and implemented the so-called policy enforcement point for CentraSite which allows for realtime resource partitioning without changing operating system, without changing application server and the Java runtime and so forth. So CentraSite is a Java thing, but same would apply to .NET as well. And we have to notice that just talking operating system is not enough because the runtime environments typically also include application server and just additional software. So what is the idea. We have an operating system and we have something called scheduling server, which is software developed by us originally on the [inaudible] operating system later ported to Windows NT and to Solaris. And we have also a portable version on POSIX kind of systems. And the scheduling server is able to do time slicing, if you will, of the CPU so that you can give guaranteed amount of CPU time to certain threads and to certain service invocations. So that's the idea. Here you see the scheduling server's principle. Basically you do dynamic modification of priorities for threads so that at certain point in time your application, your service and location is going to get the highest priority in the system and the others are just being suspended or being basically without CPU. And this allows you to give, say, 10 percent to one service invocation and 20 percent of CPU to another service invocation and make predictions about how these service invocations will perform. Brings up another interesting topic. Server and service computing and to some extent maybe also cloud computing, they're often viewed as stateless operations. So you have to encode everything you want in terms of interface with your -- to your service in this service invocation. There is no state at the server, which is -- might be difficult if you want to give guarantees. And, in fact, if you look at cloud computing today and in particular in the WSLA agreements, and then you see that there is a certain difficulty in getting guaranteed performance out of service computing, out of cloud computing today. Okay. I mentioned this already. Scheduling server. That's the idea. Next idea, if you look at service computing and also at cloud computing, it's no longer that there is a single invocation, but you have workflows. So you want to use this knowledge about the workflow and about how things interrelate and feed this back into your execution environment. And the other thing we learned in order to do predictable computing in a service space or a cloud space, it's not sufficient to look at one component. But you have to build up a model which is in this term -- in this context called service monitoring model, which looks at various levels, at your operating system, which might be Windows or Linux, which looks at the application server, which looks at the runtime environments of certain programming languages and so forth. And this is just an excerpt from this monitoring model and where you get the data from. Okay. Now a few words about cloud computing. And I go quickly here because you know this already. So cloud computing came up with -- or comes in three different flavors. You talk about infrastructure as a service platform and service software as a service. Infrastructure basically means you run a machine image. Platform means you have programming environment, like Visual Studio and Azure, like Eclipse and Google App Engine, or software as a service, you just use the software. It's like Google Docs or salesforce.com. This forms a stack again, sure, but you want to have it on a pay-per-use basis and you want to have it elastic. And then there is a kind of orthogonal discussion, and this is about whether cloud should be public or private. So far most offerings are public clouds. And, interestingly enough, the big players, like SAP and the IBM, they don't talk about cloud at all, which means they are not talking about public clouds; they are talking about private clouds. They want to have everything behind the fence. And my kind of understanding of this scene is probably we will end up at something sometimes called hybrid cloud. So there will be both. Indicators are right now you can have NVPN into the VM cloud already so that you have secure communication to your VMware -- excuse me, to your Amazon data. Another indicator, if you look at VMware, they have something called vCloud where they allow you to establish layer 2 connectivity about cloud instances, building closer coupled things inside a cloud. Next interesting distinguishing question is what is the unit of granularity. So often it's the machine image, but sometimes it's also bigger, like the Virtual Datacenter, or smaller, much smaller. Sometimes it's even a physical machine. IBM has something called the Resource Cloud where they give physical machines to researchers and have basically a portal for managing this. Next question is what is the programming model. So we are so much used to Web services that we think Web services is the answer, and I would -- ask you, this might be true for the moment, but will change. Also how do you communicate inside the cloud using service pass, using message passing, APIs. This is basically like all -- like the generic solution but not the optimized solution, in particular not optimized if you look into multicore and multithreaded architectures. There's some amazing examples, like this Live Mesh cloud application from Microsoft where you just share data and don't care. So cloud computing has many beauties. It has problems as well with the servers and the form factors and the heat. But we pretend not need to care about it because Microsoft will do, Amazon will do, and I'm not sure how long this holds true. I think resource -- talking about resource [inaudible] sharing resources for cloud computing will bring some of these problems back to the user again. Energy consumption, sure. And how does it all come together. So okay. Our idea is that we use the scheduling server or something similar for CPU partitioning in the cloud. We want to use NTrace for doing local resource monitoring and then generate probes which deploy in the cloud to figure how the systems in the cloud are doing actually. There is also movement in this space. If you go to VMware they have something vProbe which is basically like DTrace for D hypervisor. So we can instrument, write those scripts to understand what the hypervisor is doing. And how it gives resources to the different clients and guests. If you want to try similar things on Azure, it's much more difficult. You have to walk through locked files. You have to do it basically [inaudible] and so forth. Next thing, as I mentioned, I believe strongly that there will be new programming models. And we want to talk about collocation of services again. We want to talk about comanagement of productivity. Okay. I'll go quickly over this one. Talked about it already. So this was the part of applying known or previous research now to the cloud. Talking programming models, there's a couple of big innovation in the computer architecture field. One innovation is that we will see -- start seeing CPUs which are heterogenous in terms of having different cores inside the same chip. One example -- and the other direction will be just having many, many more CPUs inside many more cores inside the same CPU which are not connected like today's SMP machines. Examples are the single chip cloud computer [inaudible], other examples are systems where we have GPUs, graphical processing, unit, and CPUs being used as compute units at the same time. And OpenCL is a programming language which will allow to use this. So idea here, you have compute units and they are different. They are heterogenous. Some of them are just CPU-like devices, others are GPU-like devices are -- are maybe even different. And this will form one chip in the future. And what is not clear today is what will be the memory consistency models, do we put data there, how do you, yeah, keep data consistent. The other notable movement is towards product libraries, product computing in the chip. So example here from Intel where you have these threading building blocks, and Microsoft is picking up on this with the concurrent runtime for .NET and [inaudible] library that you basically let the CPU and runtime system decide how to partition your computation. So in the keynote we learned about embarrassingly parallel applications. And certainly in the field there are many applications which lend themselves to parallelization easily. However, this has been the case for like 20 years. And it turns out that there's also many applications where it's difficult to parallelize, and without help of compiler, no chance. Okay. Time is over. Last topic. In order to study all this, we have started building a future SOC lab at HPI which is run by a couple of companies, and our institute idea is bringing huge computers, huge in terms of memory, huge in terms of CPUs, and developing and researching all these questions I have mentioned basically in this environment. And as I mentioned earlier, I think we are talking about hybrid clouds in the future, so the questions will be how can we unite Azure and public offerings in the cloud and the datacenter. And just going to the conclusions. So we have new programming models. We have the multithreading multicore challenge where we need to learn parallel programming again. And monitoring the cloud will be most important to make it successful for businesses and for applications. >> Arkady Retik: Thank you. [applause] >> Arkady Retik: We only have time for one question. Okay. So one quick question, please. >> Just one. Talking about your NTrace, you are mentioning that you [inaudible] resource allocation in realtime. Is that correct? [inaudible] >> Andreas Polze: This NTracing is tracing. It's not resource allocation. It's just tracing, function boundary tracing and the execution flow. >> And you were mentioning resource allocation. I heard I think you combine with the word -with the phrase realtime. Is that correct? >> Andreas Polze: Resource allocation ->> Sorry. The thing that [inaudible] you were also saying without changing the Java runtime environment, I was wondering how can you do realtime without changing the Java runtime [inaudible] that's doing garbage collection [inaudible]. >> Andreas Polze: No, no, no. What I wanted to say is in order to achieve predictable system behavior, you have to do this resource reservation. And you can apply this to certain runtimes, also to the Java runtime, but you won't get any better than the runtime is per se. So if you wanted realtime, you go with realtime Java, maybe, you go with other realtime operating system -- realtime environments, sure, but the resource reservation or resource partitioning mechanism has to be able to honor the realtime requirements of the runtime. And did not -- I did not want to say and it's certainly not true that Java is realtime [inaudible]. >> Arkady Retik: Okay. Thank you very much. [applause] >> Andreas Polze: Thank you. >> Arkady Retik: Next presentation is on the cloud for University City Campus, this by professor Danilo Montesi from University of Bolonga. And the reason I paused, because Professor Montesi has very distinguished career. He actually taught in several Italian Universities, in Portugal, U.K., and worked in U.S. So he brings a lot of experience, not only as a distinguished scholar and researcher, but also he is the vice dean of his school and he's thinking here how we can take [inaudible] knowledge and help. So please. >> Danilo Montesi: Thank you so much. And, first of all, a disclaimer: I'm not a cloud guy; I'm a database guy. So you will see in the presentation. Well, let me mention that we know that cloud technology is ready to change the way in which we interact with Internet service, and I believe that in this context applications and data will play fundamental role to test the new ideas and to improve and develop the state of the art in cloud computing. In this talk I'm going to propose to use the cloud to create a virtual University City Campus in Bolonga, which probably you know is the hometown of what is supposed to be the most ancient university in the Western world. We are supposed to -- we start -- we open for business in 1088. And so we think that in this case we can say that the past melts somehow in the future. So here is a brief outline of the talk. I'm going to say a few words about the main actors of this cloud for the university. I'm going to talk a bit about the social, technical and geographical features of the Bolonga University City Campus that we believe would fit into this cloud project. And then I will sketch the main phases of the project starting from a pilot down to the extension. And then I will mention some -- I will summarize the proposal. So, roughly speaking, this is the architecture of a cloud with the infrastructure, platform, applications, user, and the environment. We have done a bit of mapping and we believe that the mapping for the Bolonga University City Campus is essentially we assume that the infrastructure is already in place and also the computing power and the storage is there. What we are more interested in is to see how to map the gray part of the slide, so how to map the environment, the application, the data, and if you want, the developers. So we will look at what are the components of the University City Campus cloud application or, if you want, why we think it could be Bolonga good case study. We've done some preliminary work. We've identified the five let's say key elements: one are the users; another one is the environment, which is not really the computing environment, if you want; another one is the Internet connection that needs to be there; the other element is the developers; and then the applications, the data that somehow we expect also to be there. So regarding the first item, the users, the potential users, there are about 85,000 students. On top of this we have about 3,000 faculty members and 3,000 admin and technical persons, plus we have another let's say 3,000 something between external faculty members and let's say contract collaborators which actually are doing postdocs and so on. And this is, if you want, the user side. Regarding the environment, it's important to say that Bolonga University is a multicampus, so it's made of five different campus. One is Bolonga, another one is Forli 50 kilometers south, and then you travel another 50 kilometers south and you find Rimini, and then you travel 50 kilometers east and you find Ravenna, and then Cesena and so on. These are five campus. Plus we have, by the way, another unit in Buenos Aires. I don't know why, but it has been there for a number of years now. And these five campus hosted 23 faculties and 70 departments. So we are going to talk about Bolonga campus which is already spread over the city, and here you have buildings which belong to the [inaudible] which are alternate with private buildings and public places. So by the environment here we mean really the physical environment. We have something called a portici and sometimes we also get some nice weather there. So we believe that this is a good environment in which experiment is virtual campus. The other element is that anyway students already spend a lot of time outside classrooms. Sometimes actually they attend classes even outside in a -- that's a square called [inaudible] where we have a colleague that is teaching there. He's also [inaudible] two years ago. But the environment is not enough. We also need to have a network connection. In the Bolonga City Campus, let's say there are two different wireless networks, one is called iperbole wireless, which is the city council wireless networks and that can be used by any Bolonga resident or employees. Then there is another network which is called Alma WiFi, which is a university staff, which is being used by students and the faculty members and all the other people working the university. And these two networks are actually meshed up, so you can access to one or the other to kind of single sign on, and you don't bother where you are. So the network is there, although it's not covering the full city. And there is also the possibility to have the people developing the cloud applications. Our university offers several undergraduate and postgraduate degrees in computing in the faculty of engineering or let's say science. And these students can be obviously users but can also be developers, encoding applications through project works and so on. And actually the integration cloud technology into developing real application can be also a way to build a competence center on the subject of cloud computing. Talking about applications and data, there are already applications running our services. We have a bunch of Web site for department, faculty, and degree programs and so on. But the main areas that are actually covered are here listed, research and the teaching side. There is an ongoing project on mobility and there is a vision to also run virtual labs. It's not my own vision; it's the vision of the dean of department. And here you find a screen shot of one of the application. Each faculty member [inaudible] Web site to interact with the students and so on. But in Bolonga we need to also upload all our publications that need to be let's say classified according to the [inaudible], otherwise the university doesn't give us research money. So this is a screen shot of my -- one of my pages where I have to upload this paper, and these are the papers over the last year. So there is a centralized public database which is being also used by the research ministry to rank us as a department, as a university. Talking about other type of applications and data here, you see an application which is [inaudible] is actually when a student take an exam, then you need to record this exam, you need to sign the exam. Under the Italian law you need to use a digital signature, which is something with a smart card we have to put inside and then you record it. I've hidden some information because it's regarding two students that did my -- the exam last month. And then we also [inaudible] [inaudible], which is a consortium that is providing us the facility for the digital signature things. So that's an application that is already there, not on the cloud at the moment. And also the students can access and see their own exams, how much they got in terms of marks. They can change let's say the curricula if they want to move into a different curricula, they want to change maybe a new course and cancel it. On top of this there are some other systems dealing with the administrative things, you know, [inaudible] and so on. We don't care for the time being about these things. But another good chunk of this system there is dealing with the library system which is actually a federated system, it's called the University Library System, which contains a number of publications and online periodicals and so on. And it's something that would also need to be moved on the cloud. So now let's look at what are the main phases of this project and figure out what are the critical points. Well, phase 1 is a pilot, if you want, phase in which as objective we need to set up a cloud infrastructure. And so make some existing applications available in the cloud. As I said, most of -- or all these applications were designed to be available through a Web site with a standard connection, let's say. Students will access these services without coming in the labs. And only a restricted group of users will use these -- will be part of this pilot phase and will be mainly computer science students. And then the idea is to move public labs -- move labs into public places in the different locations of the Bolonga City Campus. And this phase we will create the basic infrastructure that could be used then to develop custom application that will be based on existing systems or that will be user feedback collected during the following phases. So phase 2 after the pilot there is the phase that we called internal development. In this case we still involve computer science students that now became developers. So the objective of this phase is to somehow create and spread the competence that will be needed to develop the more complex applications for all the other users, not just for the computer science users. And so the developed cloud application in this phase will belong to these two classes. General application that would be decided by students and instructors using this Web 2.0 feature or collective intelligence, if you want, and the other class would be to move into the cloud other application that were -- or that are actually already available but not in the cloud. Then after phase 2 there is this phase 3 called integration where at the beginning of this phase we will have already user feedback about things that are working and things that are not working, so if you want the pros and the cons of using cloud application. So developers will have the competency and application development. So at this point we believe that it will start the phase of building application of interest for a wider audience, so for all the students. And obviously this will involve students and staff members of other departments as well. And this phase somehow is the bridge toward the final step where the first application will be available to all people working in the university. So there is the phase 4 which -- after the phase 3 that we call the extension which is actually the deployment and the testing and the monitoring of the service from the current situation that you find marked "as is" and "to be," which is done through the cloud. So let me say that following this approach there are a number of opportunities. One of these is to develop an environment and a set of applications that can be applied to several other contexts, obviously other universities or other let's say environments with similar features. This will allow to deal with real application and so to test the potential area of new development and also research for people working the cloud computing area. And then to prepare or, if you want, to form students with strong competence on the cloud, giving them all the necessary tools to understand how if you want develop applications in this environment. Beside these positive elements, there are some critical aspects. The critical aspects are listed here. The first one is privacy and data location. Under the Italian law we are required as a public administration to have a complete knowledge on the location of data. So, for instance, last year we tried to outsource the e-mail system of our 85,000 student to Google. We [inaudible]. Google and other companies came and say, well, we can do it, but there was a specific item saying we need to know for each e-mail that is being sent or received through these e-mail accounts where the data is located because we need to know which is the legal environment that we're to deal with. They couldn't give us a precise indication of where all the messages were. So we have to develop this opportunity. And we need to keep in mind the same things for the cloud. The other critical aspect, that is acceptance. Although it appears that people would love and would like to use these technologies, somehow change is always a risk, so we need to deal properly to manage this risk. Among the other things, there is the issue of the network, let's say, coverage. People are expecting to have a full coverage now with mobile things, 3G things. They expect to have everything everywhere. So this is inside of a Wi-Fi network, so it's located in some areas of the City Campus, so people coming by train are expecting to access the network before they arrive there. So obviously they will use the classic 3G or in the near future the classic 4G network, which is outside our reach. And linking with this issue there is the mobility element. We already have a project funded by Vodafone which is actually already trying to move some application in order to be accessible for a mobile device. But we've already seen there are a number of issues that need to be reconsidered any way these are parted, we should traverse with cloud or without cloud, it has to be done. Okay. Let me summarize our proposal to be in time. Well, first of all, here the proposal to build a University City Campus based on cloud technology. And the objective is obviously to do it for the good of the students and the faculty members and everybody who is working around this project. But the issue is also to test and develop these techniques in a real environment. And we believe that Bolonga is a nice place where to do it. And the other element is that we think this system building through divided team, if you wanted, developing into smaller chunks and more phases and use the computer science students to gradually reach all the objectives of the proposal is also something that could be interesting and to experiment. Okay. Thank you so much. Questions? [applause] >> I have a question. Do you think that the governmental laws will change around governance and student identification? Seems to be a challenge for all of us as we go to a cloud, and usually security and governance is a blocker for regulation. >> Danilo Montesi: Well, usually, yes, laws do change with new government. It appears that, anyway, they tend to became more restricted instead, if you want, more open. It appears that with the privacy law, so I've been told by our lawyer in the university, or colleagues and faculty of law, that is supposed to be very good although quite strict. And, anyway, we would like to have a different legal environment but we have to face what is in here. For instance, this issue of digital signature. I don't know. When I was working in U.K. they didn't have -- I was just filling an Excel form, A, B, C, this was the grade, you stick one of these out of your door, you give another one to the [inaudible]. It's done. It was very nice and very simple. When I moved to back to Italy and actually to Bolonga, I have to face these things, this little card and then with the USB driver and the Java running machine, which is not the [inaudible] then this browser is not the one that we are supporting and so on. I would love to remove all of these. I cannot, I should say. These things are -- digital signature is something which is, I believe, much bigger than obviously Italy [inaudible] things but also in the United States and in a number of states is being used. It's also usually, I believe, in other environments, like, you know, in hospitals where indeed they need to put down things. And so I didn't answer to your question, but so I'm sorry. >> I think it's a great project. And I'm interested in hearing the stages of [inaudible]. So has the university approved this and when do you hope to start phase 1? >> Danilo Montesi: Well, the mobility has been approved and founded outside the university budget. And we are looking actually to complete the other part, to form the cloud part. >> And what's the timetable? >> Danilo Montesi: Well, for the mobility it's about 18 months, although few services will move so that it could be accessed from [inaudible] because there is also the issue of reorganizing all the content for a mobile device, which is a bigger [inaudible] editing [inaudible], if you want somehow. Yes. >> What about privacy issues? Isn't the university worried about privacy on the cloud? This seems to me to be the greatest factor [inaudible]. >> Danilo Montesi: Well, as I say, if you take Gmail as an example of a cloud, which it's not by any way, as an example of a service for which you don't know where actually your data are physically stored, you don't know if it's down in Sweden or in Ireland. In that case we have been told we cannot do it. So they put on paper that we need to know the location of all the data. So for us cloud means that it could be everywhere on the national territory. So you could put a data server in Turin, in Rome, in Milan, in Florence. You can move the data here and there. But if it -- but the -- I mean ->> [inaudible] my university requires that we have accounts from the students [inaudible]. >> Danilo Montesi: We have to drop the Gmail offer because they couldn't tell us for each message being sent or received where this message was, under which jurisdiction, if it was in Sweden, in Ireland, in England, or in Finland. So since they couldn't say where it was, because they told us that even then they don't know where they are because they have a system which is redundant, organized in such a way that they don't know where are all these messages. So we say we're sorry, and naturally they [inaudible] outside of this project. >> [inaudible] all of the students typically have [inaudible]. >> There seems to be two issues here, one's very political. >> Danilo Montesi: Yes. >> And the other is the technology. We could argue that the [inaudible] would be student records are better served in cloud service than they are being floated around the campus somewhere and the example you gave, that paper based, I mean, that's not secure at all. We've seen many, many breaches of identification in the United States around just that very issue, of being very loose around identification. But to me it's really a political issue, that the traversal [inaudible] a lot of countries do not want to have their name -- the records sent to somewhere else other than their geographies. That's a political issue. And I'm wondering if that's -- that to me is the abstraction here, not necessarily the technology issue. It's a social issue. >> [inaudible] you say political. I think it's legal really. That is to say that the -- if my mail is stored in Australia rather than Italy and maybe the privacy law in Australia differs than that which is applied in Italy [inaudible] and maybe my [inaudible] so it's -- I don't know whether that is political or legal. >> I think it's all tied up in the politics. >> It's not technical [inaudible]. >> Of legal -- right, it's not technical, that's the point. >> Yeah. >> Arkady Retik: Okay. Last question, please. One more? Thank you very much. [applause] >> Arkady Retik: Our last presentation of this session is on cloud computing projects in engineering. This submission is also by two professors from -- two professors from Colombia: Harold Castro, who will be doing presentation, and Jose Canandez [phonetic] who is coauthor. Both are distinguished scholars and researchers as well. Professor Castro, who is -- who will be doing the presentation, is professor of computer science of the University of the Andes, and he is involved in [inaudible] and research. As a researcher he is director of COMIT group and his colleague is also director of another group, and what they show is actually a synergy of putting together computing and engineering. >> Harold Castro: Thank you. Okay. I'm going to present this evolution of the work we have been doing in Universidad de los Andes. First of all, I would like to introduce you to context what is our -- how our university is, why are we working on these kind of projects, and then how do we get to the point where we are. So for doing this I'll have to present my idea how the problems -- which are the problems faced by universities and specifically computer science departments and how do we approach these problems and how do we -- how are we supporting our solution on the cloud strategy. Finally I will present the projects we are working on. Well, first of all, Universidad de los Andes is located in downtown Bogota. And it's a private university. That's an important issue. It's main school is the engineering school. Within the engineering school we have a systems and computing department. It's not a computer science department. It's not science, it's engineering. And that means that our focus will be to produce solutions that will be implementable in our context. In my idea for university and Universidad de los Andes, this is a good example for our context, we have to do both two attacks, research and education that's clear for us. We have to present a resource or more directions. To do our research we will need computer power. In any field. We have seen this in this keynote, we need computer power and data management. Now, we want to do our research to be faster, more accurate, bigger in big collaborations and unoptimized [inaudible] resources we have. And we need education. And when I mean education I will think especially in computer science or computing on systems and systems and computer, whatever the name in English will be. We need experimental laboratories, experimenting that will be flexible but that will be real. Our students need to have access to the latest technologies, but those technologies that are being used by the industry, not the traditional open source solutions that we tend to use in universities. And of course we have to be -- have to optimize the resources that we have. So our research at Universidad de los Andes is organized in a federal -- Universidad de los Andes is a federal organization. That means that each school has its own resources and the research they do is limited by the resources they have. There's no campus-wide resource for research. So science and not only the science school but the biology department in other sciences, they have their own resources. And I will say that each faculty will have its own resources that he can use. We are working on several international projects, and so we need the tools to collaborate on these projects. And it's clear for us that the worldwide trend to is to go in this direction. So we need to set up something that we call the Campus Grid Uniandes initiative because to do our research in this these conditions this grid is less apt to be the way to go. So we started this initiative in 2007. It was led by science, but the engineering school, my department is computer and science, science and computing systems and computing, and DTI. The DTI is the information technology office of the university, and this was the first time that this division will be involved in an academy project. Traditionally they have been supporting the administrative processes of the university. They have their Web cities to have the academy and all this stuff, but they don't -- they are not usable for research or for labs of the academy. So they only -- they are to support the infrastructure. We need an initiative that will be linked to the world using our local Internet 2 connections. In Colombia and all through America we have this [inaudible] which is the Internet 2 approach for that in America. In Colombia we have [inaudible] which is the way we have to link this [inaudible]. So we decide to use this infrastructure which is not very big but at least it's connected to Internet 2 [inaudible] in Europe. And now we are connected to this -- as we are part of the CMS project of the [inaudible] in Europe, we are connected. Our site is currently connected to EGEE, to EELA, which is the initiative for a grid between Europe and Latin America and is already working, the ROC-LA, which is our high-energy physics solution for them. And we are working to develop -- we are working to link it to the OSG initiative here in the States and GISELA, which will be the new EELA project in Latin America to build generic grid for all the institutions in Latin America. We have a national projection of this initiative at the Universidad de los Andes. We will lead -we are leading the grid Colombia initiative to build a national grid initiative in Colombia. And for that we are hosting the Colombia certification authority. And the mandate for the university was to focus on the applications. They know that grid is great for physics, but they ask me especially to work with new schools to find new application, new users to build our infrastructure for the whole university and not only for the science -- for the physicists, in fact, for the physics. So I start work [inaudible] bioinformatics, optimization, [inaudible] computational chemistry and different new projects. This is the status, the current status of the initiative. We have this center grid hosted by the DTIs, the information office at the university, link it to CMS, EGEE, soon to OSG. We have storage processing, a medium-sized cluster, and the big idea was to link the resources from other units at the university. So we have here a 20-core cluster, here they had several servers, bioscience they have also their own resources. And in engineering we didn't have big clusters, in fact, but ISIS, which is systems and computing, my department, we had computer labs that we know that they have a lot of available capacity that can be used in this context. And that was our work. Then we wanted to increase the computing power, but remember I said at the beginning, Universidad de los Andes is a private institution, and, being a private institution in Colombia, not to much money for research, but a strong scheme for funding the education activities. Because most of our budget come from the fees, the tuition fees from the students. All the money has to go back to these students. And our -- most of our students are undergraduate students. So computer labs are very widely present in our campus. We have more than 2- or 3,000 machines available for them. But for research, there is some -- there are very big difficulties. We have a National Science Foundation agency in Colombia, but their budget is really nothing compared with any developing country. So the university from time to time does big investments. So last -- three years ago they built a new building for the engineering school. They invested $20 million in labs and infrastructure. But that's a thing that we know we are not going to have again in the short term. So we have the labs and now we have to be sure to find new ways to fund our research. And to find funds I have to attract new users if I want to convince that directors of the university to keep investing in this initiative. But it's difficult because of the inherent complexities of the grid system, because of the real architecture that we have to set up to compliance with international standards. And because finally not all the applications are parallel -- are easily parallelizable. But even if they are, someone has to do the [inaudible]. So taking all this into account, we extended our concept of Grid Uniandes initiative to UnaGrid, which is a step forward, to use the -- to build an opportunistic grid around the campus and try to solve all the other issues that we are facing right now. UnaGrid is then -- well, it's called a desktop grid and volunteer computing system, introducing a new concept for us which is customized virtual cluster, what our idea is. Okay. Every research group is -- used to have a specific environment. We need to reproduce the same specific environment in the available infrastructure. I don't want the researcher to learn a new environment. I don't want to force them to learn Condor or MPI or a new specific environment I will say. I will better reproduce a specific environment that they are used to work with using virtualization and priority management on the computer labs, and we can -- and using some dedicated servers to represent the master role in a typical cluster. We can ensure that the quality of service that they are going to receive is acceptable for them. We developed the Web interfaces to give access to this infrastructure to both administrators and regular users. And we start developing projects on [inaudible] following the this idea. Then when looking at the education that we are doing and our department, we found that we wanted to get our students to be exposed to as many technologies as possible but also in environments as real as possible. So we developed a software architecture laboratory which use the idea of UnaGrid which is where they customized the virtual cluster. The idea here is we build five enterprises, real enterprises [inaudible]. Two manufacture enterprises, one producing furniture and another one I don't remember which is the -- I think that is a milk process industry, a telco company, a bank, and a university. We know this business, so it was quite easy. We sign agreements with the real vendors of the products that are being used in Colombia and everywhere: SAP, Siebel, Peoplesoft. We convince these people that we can set up a laboratory where they can take their customers and show real implementation and real solutions working on real platforms. So they help us, they give us the licenses. They came to our university to install and to configure these technologies on these industries. And we use different platforms to set up this infrastructure, also with agreement that we have with Microsoft, IBM, Oracle, and of course we also have open source solutions. Then we started to try to mix both worlds, research and educations. We found that we had two different users, system administrators, the ones who have some difficulties to achieve their work on this kind of platforms, users who want to achieve their solution, not be in our of the platform that is evolved. So we did a lot -- that is the kind of interfaces -- Web interfaces that we developed for both administrators to set up computer labs and to deploy these customized virtual clusters, and for researchers so they can ask the number of machines they want to deploy in the time they are going to use them, and they can go and monitor where the machines that they are really using, the physical machine that they are using and how their projects -- their work is progressing. We found new difficulties. The administration is very, very difficult. The manual deployment of a realtime cluster, the IP assignment, we have public addresses, but because we -- we have it, but it's difficult to assign these IP addresses to the virtual machines. We don't have any accountability of what is going on on these platforms. And users complain also because their interface is not natural for them and we have -- we have to reengineer all the project. And what was the answer was cloud. The cloud ideas come to the rescue because they have -cloud have inherent characteristics that are very good for what we are doing. We were already using some of them, virtualization, customized environments, on demand deployment, delegated administration, and these are little bit -- users don't care about how these customized virtual cluster are operated. But there are some others that can be integrated that we use, we see that can be very useful. For example, on demand configuration, and here I'm thinking on faculty expressing the kind of scenario that should be used by [inaudible] students. I would like to have physical machine transparency. Right now we do -- do most know where -- which computer room is going to be used. I need accountability. I need new advanced administration tools and different complex cluster configurations. So we think on Clouder, which is the name we gave to this project, which is an opportunistic cloud computing implementing an IaaS service model. We started the development of this Clouder to make it as close as possible as the researchers used to use when they think on the needs of a computing platform. So they have to express the kind of operating system they want to use, which kind of research they are going to work with, and then the applications -- the specific applications they want to style. And with this idea we started different -- identified three different kind of projects that we are going to set up on this Clouder. The traditional cluster-based research project, which use the master and slave model, some number crunching interactive projects because we are -- my colleague who wrote the abstract with me is working on how to have high performance visualization for his work, and these dynamic scenario projects for the computer [inaudible] laboratory. So we started that. We have some results right now. We have work with the laboratory for bioinformatics, and they are doing DNA sequencing using our opportunistic resources. We mix now [inaudible] they have a cluster, we can mix both dedicated and opportunistic resources. And they -- for them the moral is the same. I have the traditional cluster but they won't know that this cluster is not really -- doesn't exist. They are going to send all the requests to the cloud. And if everything goes well, the cloud will solve the problem. Right now it's not that transparent. They have -- they know where it's going to be executed. They don't have all the facilities, but we are working on it. The visualization team, they expect to have a big facility to do their number crunching at a local facility to deal with the visualization problem. And visualization we are meaning some things like that, different screens and integrated and [inaudible] in parallel with collaboration tools within the -- within the same environment. And, again, we are now doing this not in a cloud transparent way, but we are asking our friends in Alberta, they have a big cluster to solve the processing and manage the data. We receive through Internet 2 [inaudible] and all this network infrastructure. The results are we locally manipulate the scenes that we are receiving and we achieve this kind of visualization in our facility. And for the multiscenario projects, we are imagining then that it's -- a faculty will decide which is the environment that he wants to produce for his students. For example, he will say, okay, have the bank here, the university here, the manufacturer here, and I want to -- using SAP, using Siebel or whatever software he wants to and to produce the scenario that he wants the students to work with. And once he configured this project, the students will come and deploy the scenario on demand on -- using the opportunistic structure we have. We are doing right now using the [inaudible] service, but we are on it. That's it that I want to say. So it's okay. [applause] >> Arkady Retik: We have time for questions. Please. >> We're trying to do something similar to what you have described, but we haven't started yet. But it's part of the project that my colleague Donita was describing before your talk. One of the things we are trying to assess -- I mean, the certain model we have in mind is that the -- you as a professor enter the room with 80 students and you teach a lecture, you're lecturing on middleware systems, your idea we would like that they download all the [inaudible] technology they require for your lecture and exercise, say application service and stuff like that by magic. They do not have necessarily on the laptop. So it should be downloaded from some magic cloud. What we try to assess is how much bandwidth do we need. I was wondering whether you had some sort of firsthand experience on that. >> Harold Castro: At the beginning we use enough server to have all the images, and it doesn't work. It simply doesn't work because they are very heavy. And remember we wanted to use that on the -- on computer labs. So what we had to do was to have a local copy of each image on each computer room -- computer on their computer lab. That is visible because our machines, as I said at the beginning, are paid with the tuition fees for the students, so they ask for good machines. So we invested in nice machines with a lot of disk and RAM. So I'm taking 100 megabytes of this RAM to store our local image of each one of these image that they are going to use. >> [inaudible] >> Harold Castro: Yes. Yes. Exactly. Which makes difficult deployment. That is where our next step is how to deploy that easily. But right now we are doing this which is very difficult, and because the problem that I said of the IP assignment, you have to change it. So we are working on how we can assign the IP address at the moment of -- at the bottom moment. And the other problem will be to -- how do deal with these local copies that we have to maintain. [inaudible] that's difficult. But right now we are doing [inaudible]. But each image is around 2, 3 gigabytes, meaning that we have 40 images with no problem, nobody is missing that -- those gigabytes. >> Does that mean you have a limited number of laboratories? >> Harold Castro: Yeah, yeah. >> Because we were wondering how to expand this [inaudible]. >> Harold Castro: Right now I'm using the best laboratories to do this. And if I am -- I'm successful with this, the new project, I will go to the other labs. But that will be a difficult issue. >> Arkady Retik: Any more questions? >> [inaudible] have a similar situation. We're also working heavily with virtual machines. And usually the system administration pushes all these images down to the PCs perhaps the day before you need it. So [inaudible] using this approach seems [inaudible] works very well. >> So you would have 24 hours, say ->> Harold Castro: Yeah. You have to plan ahead a little bit. >> [inaudible] several days before [inaudible] the administration it cannot automatically deploy [inaudible]. >> [inaudible] we have this system that deploys images automatically [inaudible] images you need in advance, which is quite problematic. I wonder if that doesn't defeat a little bit [inaudible]. You're not -- the solution of the [inaudible] images to each machine I think kind of defeats the purpose of a cloud environment at the other [inaudible] I have a cloud but I don't really have a cloud since I have to provide for all these local images. >> Harold Castro: That's right. That's because the kind of infrastructure we are going to use. Otherwise ->> And I think it's specific to university because you have different labs all the time. In a commercial environment you don't really change the image as often. So maybe this is kind of university solution. >> Arkady Retik: Any more questions? Thank you very much. Thank you to the speakers. [applause]

>> Arkady Retik: Okay. Good morning. My... session. I'll also be chair in the session this...

Related documents

Products

Support

&gt;&gt; Arkady Retik: Okay. Good morning. My... session. I'll also be chair in the session this...

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

>> Arkady Retik: Okay. Good morning. My... session. I'll also be chair in the session this...