>> Rustan Leino: Good afternoon, everyone. I'm Rustan Leino, and we're going to give you an overview today of work that Jean-Raymond Abrial and I have done in the last five weeks on thinking about a hypervisor. So the start of this work came from that when we think about co-development today, the typical way that this is done is that you write some piece of code and then you write test cases and you hope that the test cases will work and you debug the code and then you find it doesn't really work and you change it, and then, of course, you leave many bugs in the code and the chips, and you go back and you fix it, and the whole process is very, very expensive. So what we'd like to try to do is to use a different technique for developing code. And the different technique is a top-down technique, so starting from requirements, starting from specifications and deriving step by step in a gradual way the eventual code. So there are techniques for doing this, and we're using one that Jean-Raymond was co-inventor of, the Event-B Method. And so to have an application for this to put these techniques into practice, Wolfram Scholter [phonetic] had suggested that maybe we should try a hypervisor. So there was a recent effort to verify the Microsoft Hyper-V. So the Hyper-V is a big body of code, let's say 100,000 lines, and a three-year project with a joint effort with the University of Saarland set out to specify the hypervisor. Now, when you do that you specify it -- so they used the VCC verifier, and when that was specified and verified, that's done after the code is written, and that's always much more difficult. So what we're trying to do here is do all the specification and so forth up front so that the code comes last, and that's the top-down approach that we're taking. So what we want to do is to produce a hypervisor. A hypervisor is a virtualizing piece of software that takes one hardware chip with -- so one physical chip that it wants to virtually give out to a number of operation systems, and so each operating system can then execute on this chip as if they have full control over the chip. And we're assuming that there are -- there are several cores in the machine, which is realistic these days, and so what we're trying to do is distribute that computation in such a way that when each operating system is running, that is, each guest operating system, as they're called, would not be aware that the others are running. That is, it's un -- you cannot detect from the guest standpoint if you're running in the hypervisor under the hypervisor or if you're running on the hardware yourself. That's the idea. So that's, for example, what the Microsoft Hyper-V is doing. So that means that there are a number of things that you have to virtualize for these guest operating systems, and the main ones are memory and interrupt. There are also timers and maybe other details, but we're going to focus on memory and interrupt. All right. So the way that -- the general strategy for doing this kind of work with a top-down development is that we start with a bunch of requirements. So we're going to show you a flavor of the requirements that we write. And so the requirements are stated informally but as precisely as possible, and then you develop some strategy for doing the refinement, which is to say, well, we're going to first introduce the following features and then we're going to introduce more and more detail into the whole thing. So that's the refinement strategy. And at each step in living out that strategy is a formal model. So you develop a formal model that describes those pieces of the system that you're trying to develop at each time. So here, for example, there are choices of which things to introduce first, and as you will see, it comes very naturally to first describe, to start with, with individual machines that are running, because if you can describe individual machines running in isolation and then refine everything into a system where things share the hardware, then you have from the very beginning proved that you have machines that operate independently of each other. This is also what was done -- or a similar technique was done in the SEL4 verification project where they started with some high level specifications or simulations and try to refine things, but we're doing it using a different method here, and then we'll conclude. So with that, I'm going to turn it over to Jean-Raymond, who will take us through the technical development. >> Jean-Raymond Abrial: Thank you, Rustan. Good afternoon, everybody. I'm Jean Abrial, and I'm going to continue. So, first of all, we're going to discuss the requirement document. We do not rush immediately into the formal modeling. And the requirement document is made of two texts, the explanatory text and the reference text. For the moment in the requirement we have some sort of short explanatory text, but we put a lot of effort to develop the values referenced, and they are made of short statements that are labeled and numbered. So the first task we have to do is precisely to, for this project, to define the values labeled with [inaudible]. And here they are. So as Rustan said, we are going first to develop the system independently, so single system handling SM, single system interrupt handling SI, and then we enter into the motion of multiple system under the control of the hypervisor, so we have requirement MS, HV for the general requirement for the hypervisor, and then the handling of the memory, which corresponds to this, and she the handling of the interrupt, which corresponds to this, and in the middle the problem of scheduling of the values guests by the hypervisor. Okay. So let's go through these requirements, because again and again we insist on the absolute necessity to write down very carefully those requirements. So we spend, Rustan and I, a lot of time writing them and also having interviews and mails and telephone exchanges with some experts which gives us some clue in order to understand the problem because this is their normal job. So here I'm going to a little fast through them. You can just read them. So the single system memory handling. So we have an operating system that used memory device and some timers, and the amount of memory accessible by an OS is determined at boot time, and we call them IPA, intermediate physical addresses. And, by the way, if the OS tried to access memory through an unexisting IPA, and then it crashes. That's unfortunate. So the running of interrupt -- so so much -- there are, of course, far more things in the handling of the memory for the OS, but from the point of view of the hypervisor, we're not interested in that because this is the private business of each OS. So for the interrupt we consider only uni-processor systems for the moment. So an OS can be interrupted by some device that are connected to the unique CPU running the OS, and now some properties of the interrupt, each interrupt has got a priority which is a natural number, and then we have a notion of state for an interrupt. It could be inactive, pending, or active. And of the state of an interrupt is managed by the hardware. So now I'm going to describe a little more carefully these three things and explain with some short statement like this. An inactive interrupt is one that has not happened since it was last treated. A pending interrupt is one that has happened but not treated yet because its priority, it's not big enough. An active interrupt is one that has happened and is either treated now by the CPU or preempted by and interrupt of higher priority because we can't have nested interruption. And now when a pending interrupt is arriving, is defined, if its priority is high enough and it results in the hardware sending a signal to the concerned core of the OS and the OS acknowledges this signal and the hardware makes the interrupt active. But, of course, for a pending interrupt to interrupt an interrupt that is already -- that is now treated by the guest -- by the OS, it has to have a higher priority. So it goes on top of it and becomes an interrupt, the interruption. And so among the active interrupts, this is the consequence of this thing. The one that is activated is the one -- that is executed is the one with the highest priority. And here it's a bit redundant, what I've said before. A running interrupt can be preempted by a pending interrupt with a higher priority. At the end of the interrupt handling the OS sends an end of interrupt signal to the hardware which makes the interrupt inactive. So this is the story of an interrupt. An interrupt can be masked or unmasked by the OS. So much for the individual interrupts. Now let's go to the hypervisor requirements. So the role of the hypervisor is to simulate on a single machine with several code behavior of independent guest OS. So you can have a look at these [inaudible] here. We have several OS that are independent, and we want to simulate this with the hypervisor. So the hypervisor, as Rustan says, has got as many interrupt as we have here, but you see the numbering is not the same. So these are the physical interrupt and this becomes -- for each individual guest this becomes the virtual interrupt. And here we have a device, hardware device, called the interrupt controller, and here we have the -- of course, the hypervisor machine has got several CPUs that are connected to the IT controller and they share a [inaudible] memory. So when we have to do is to put all these guys here into this list here and put all the individual memory here inside a unique memory so that we can simulate completely with the hypervisor, we can simulate the behavior of several OS's. So we have a number of elements that are determined at boot time and the number of guests is fixed. We cannot add another guest after the boot. And, of course, the maybe thing is that as Rustan said, a guest always is not aware that it's been executed on the hypervisor. The guests are rarely independent. This is a very strong requirement. A guest cannot pollute another guest or cannot -- or even don't know that there are other guests. And now, of course, one of the responsibilities of the hypervisor is to schedule the guest. The idea here is we might have more guests here than there are CPUs here, so not all the guests are running, so some of the guests might be sleeping. So it is the role of the hypervisor here to, from time to time, maybe some time slice, to eject one of the guests in a CPU and put another guest in place of it if it has been sleeping for a certain time. So you can imagine that there are lots of switching that has to be done when doing this scheduling. And, of course, we require here that the hypervisor is fair enough. It's not -- from time to time he has to give control to guest, of course. The hypervisor handling of the memory. So we studied two things, as Rustan says. We studied the memory first and then the interrupt. The interesting thing for the memory, we go from physical memory to -- we go from a virtual memory to some physical memory, and for the interrupt we just go the other way around. So all the memory of the guests are embedded into a single memory, that of the hypervisor, and for making this easy the hypervisor or the machine here, the hardware, has got a device called the SLAT, the second level address translation which keeps track for each guest of a one-to-one mapping between each address of the guest and the corresponding address here between the intermediate memory address here for each guest the physical address here on the memory of the hypervisor. And this connection is one-one, which is very important, because we do not want for the memories to overlap. So we have this table called the SLAT, and when -and so the things are relatively simple. When a virtual guest wants, for example, to write in its memory, he does not know, but the hardware -- the hypervisor traps this writing operation and, behind the curtain, change through the SLATs, which is a one-to-one correspondence which is established at boot time. So the SLATs, it translates this IPA, this intermediate address [inaudible] into a real physical address. And it is a device that is now in several machines, which is extremely nice. So the size and content of the SLATs is determined at boot time because at boot time we fix completely the way -- the portion of the memory of the hypervisor for each guest. So this is determined once and for all at boot time, and, again, this is an injective or a one-to-one correspondence so that the memories do not overlap. The SLAT itself for each guest is addressed by means of a base address. So for each -- there is a SLAT for each guest, and for each guest there is a base, which is the beginning of the address of the table of the SLAT. And I will describe more about the SLAT here. So here, this is a description of what I've just stated. The connection through this map of an IPA to a PA. And, of course, if this process fails, then it is because the IPA that has been provided by the guest is a bad one, so we have a second-level page fault, and the hypervisor reports to the guest and probably kill the guest because he's tried to access an address that is a faulty one. Okay. And this is described here. Now, the SLAT could be quite big. Therefore, the SLAT is in fact structured like this here. Rather than it being a unique table, it is far too big, so it is in fact controlled by a tree structure of pages. So a PID is asserted to [inaudible], it has got 10 bits on top, 10 bits here in the middle and 12 bits on the lower part. We have the base address of a SLAT for a certain guest, and we have then the top word page here, and then we have a second level and then the third level. And so this thing is extremely simple. This page here are 2 to the 10 of size [inaudible] page. So the 10 bits here are taken. The address here is a certain element in this top page, and then this top page goes on to -- this address here goes on to a secondary level page which is also a word page, and then we take this 10 bits here, we address here this thing, and this thing here now goes here for the bytes page. Now this is our byte page, so 2 to the 12, and these 12 bits here address these. And the memory is here. So the memory of each guest is in this portion, but you see we have these two levels here -- or three levels -- one, two, three -- in order to access. Of course, the problem is that -- and so this is all explained in those requirements. So you see we pushed the requirement until this low level. And, of course, this is very much time consuming, because you can imagine that in order to go from an IPA to a PA, the PA is here, the IPA is here, we have to work through two levels. So it's two memory access. It's far, far, too much. So there is something called the SLAT -- the TLB -- what's the name of ->>: [inaudible] >> Jean-Raymond Abrial: -- oh, translation lookout buffer, and this is [inaudible] memory, a short [inaudible] memory of the size, you know, 16 or 32, which maps directly a PA to an IPA. So when you work through this, if it was not in the TLB, then you have the two, and so you push this into the TLB, probably you evict some pair of the TLB, and next time -- because if we access a memory, we could think that we will pretty soon access the same memory, so then next time the TLB is doing a short circuit between the IPA and the PA. So this is the unique. So there are lots of hardware doing this. And this is what is explained here. So, so much for the memory. >>: [inaudible] >> Jean-Raymond Abrial: You flush the TLB ->> Rustan Leino: When you schedule a new guest operating system, you have to flush it. >>: You have an individual TLB for each machine? >> Jean-Raymond Abrial: For each CPU here, yeah. >>: For each -- for each core? >> Jean-Raymond Abrial: For each core. For each core. >> Rustan Leino: Which means that the guests and the hypervisor running on that core will have to share -- will have to share the TLB. >> Jean-Raymond Abrial: Okay. >>: I guess you just don't understand why you're modeling it at this level. I mean, if the idea was to come at this top-down level, like even the three-level mapping, I mean -- did you need that at the beginning or could you just have modeled it -- I mean, the point was to model this thing that shares these -- these OS's share the hardware. >> Jean-Raymond Abrial: This is exactly what we are doing. You will see exactly when I show you the demo. At the top level, at the abstract level, the SLAT is not at all like this. The SLAT is just -- and then we will find ->>: [inaudible] >> Jean-Raymond Abrial: -- and this is the ultimate refinement, so we go down. Okay. But this is not the first level, of course. But I put it -- no, but you raise an important point here. The problem is that the requirements are -- there is some sort of hierarchy in the requirements, and there are requirements that are very general and then requirements that are more particular. And precisely, this is the role of the refinement strategy to put this hierarchy and to say, oh, we are going to take this and then this and then this and then this in the way we can go down. Okay? So let's go to the interrupt now. The interrupt was a little more difficult first because we had to ask the experts, and they give us [inaudible] some information. So the interrupt system of the hypervisor is virtualizing this interrupt. So you remember the situation here. We have the individual interrupt for each CPU and we have all the interrupt level here for the hypervisor. So, first of all, there is a one-to-one correspondence that -- an injective connection is established between each physical interrupt and a pair made of a guest and an interrupt. For example, the 1, 2, 3 physical interrupt are connected to the 1, 2, 3 of these guests, and then 4, 5 are connected -- mapped to 1, 2 of the second guest, and 6, 7, 8 connected to 1, 2, 3 of the third guest. And, again, this one-to-one connection is established once and for all at boot time. And then each physical interrupt is connected to a single virtual interrupt corresponding to a guest. So now we have the problem that -- here you'll remember that the interrupt was physically arriving and here the interrupt was virtually arriving, but what is physically arriving is the physical interrupt. So the physical interrupt arrives here, and immediately the hypervisor traps that interrupt, takes control, and what it does, the hardware puts that interrupt as pending and then the interrupt service routing [phonetic] of the hypervisor takes control, makes the -- acknowledge the interrupt, and then the interrupt is made -- by the hardware is made active. So now the physical interrupt is active, but the virtual interrupt is still inactive. It doesn't exist still. So then the hypervisor is looking whether the guest corresponding to this interrupt is sleeping or not. If it is sleeping, the physical interrupt remains active until later on, the hypervisor will schedule that guest, but if the guest is not sleeping, for example, that guess here is sitting on this thing here, so what the hypervisor does, it simulates the arrival of the virtual interrupt by sending to some device called the virtual interrupt controller, sending the corresponding interrupt. And the OS is cheated. He believes that this comes from outside. Actually, it doesn't come from outside, it comes from the hypervisor who plays the role of outside. And then things are continuing exactly as I said. The interrupt -- the virtue interrupt is arriving, and now the guest does its on business on this interrupt until the moment where the guest finishes to treat the interrupt, so he sends now an end of interrupt, which is a virtual end of interrupt. It is trapped by the hypervisor and the hardware, which makes this virtual interrupt inactive, and as a side effect it makes also the physical interrupt inactive. So you see the story of a physical interrupt arriving. So that's more or less what is described here. So now we flew rapidly over the requirements, but again and again when these requirements are finished we have something which is clear. We discuss -- we send this, we discuss with the experts, we discuss between us, and we thought that now we understand. And now we are ready to do our refinement strategy. So the refinement strategy is the following. Handling of the memory and, later on, handling of the interrupts, because these things are pretty [inaudible]. And in each case, as Rustan said, we are going to have several physical OS sitting next to each other, which is formalized this. They are completely independent. They do their little business. And then we refine this by saying, no, this is not true, this was an abstraction, because this is the abstraction we want. We want these OS's to believe that they are completely independent, and then in the refinement we are virtualizing the OS by means of the hypervisor. And this will take also, as I've said, several levels, in particular for the SLAT, which is not introduced as such at the beginning. So that's what we're doing. Okay. So let me now demo with the Rodin Platform. So this is the Rodin Platform, so I'm going first to describe the memory. So it's a project called Hypervisor GRA5. You see that there were lots of tries before, and eventually we got to this level. So, first of all, a project with Event-B is done by a number of contexts and a number of machines. So the contexts are the places where the constants are defined. So let me show you the first context. So defining the address space of each guest. We are on each guest independently now, and we have three sets, guest, value, and N, and this abstract set of addresses. Value is the value to be stored in the memory and guests are the guests. And there are two axioms and a constant guest address. So guest address is just a bunch of addresses corresponding to each guest, so it's a set of addresses. And this is not [inaudible]. The guests have some memory. Okay. So now we can go to hypervisor m0. So this is the -- this is the first level. So I'll explain a bit what has been done here and also what I will do in the next one. So the idea essentially is the following: If we take a program trace of an independent guest, from time to time he's writing in the memory. Write a1, v1; write a2, v2, write e3, v3. And in between the guest is doing whatever he wants on this local problem. But from time to time he writes on the memory. He also reads on the memory, but we have only considered writing because it's more difficult. Reading is really easy. We have not done it. And here you see this is the corresponding program as executed with the hypervisor. So the hypervisor traps this write operation and changed it into a hypervisor write. Hypervisor write, hypervisor write. And what we want to be sure is that this program here and this program here are exactly the same. The fact that this is written now by the hypervisor in this part of the memory here or in a portion of the memory that is devoted to this guest. So this is essentially what we're doing here, and this is done by the defining several -- by defining just a single variable, which is a guest memory, and the guest memory is a function from guest and address to value. And it is initially empty, and then we have -- we have a single event called write with three parameters: a guess, an address, and a value. And g is a guest, a is an address of this guest, v is a value which is write -- with the pair ga we write in the memory, we'll write the value v. So this is just very, very simple. Okay. So now the next step is to go and to -- is to slightly, step by step, introduce the notion of the hypervisor. We have now a second context, and that context is defining the SLAT of which guest connecting -- connecting a guest address to an IPA to host address PA. And both IPA and PA are addresses, so that corresponds to your question. You see that the SLAT here is just an injective function from a pair of guest addresses to another address. So I do not define at all this sort of thing at the beginning. And it is a constant because you'll remember it has been entirely defined at boot time. So everything that is defined at boot time from the point of view of these are constant. And so now we go into machine m1. So machine m1 is seeing now this context c1, and the context c1 was itself extending context c0, and now we did not find the guest memory. So the guest memory has disappeared from the variable here, and we have only host memory. So we start doing things at this level now. And we have here a gluing invariant that is gluing the guest memory to the host memory, and it's very simple. It is just the function composition between the SLAT and the host memory. And because the SLAT is injective, all the parts that are in the host memory for a certain guest are different from that of other guests. And so the transformation here is very simple. Rather than writing -- I can show both of them. I can show here the abstraction. So the abstraction was guest memory of ga and now we have host memory of SLAT of ga. And because we have the gluing invariant guest memory SLAT composed with [inaudible] that you see here, so the proof will not be very difficult. Okay. So this is it for the -- for m1. So now we go to m2. So m2 is doing the following. In the second refinement what we do is we introduce the value score. We introduce now -- we have shared the memory, and now we're going to introduce these people. So we introduce the value score, and, again, we have an abstract connection between -- partial injective connection, partial because we might have more guests than calls in the hypervisor, and so this is -- the scheduling precisely is going to assign a guest to a processor, to a physical processor. And as I say here, we also define the scheduler. This is done through another context here, and that other context is defining -I'm not sure. Maybe not. Maybe not. Let me have a look. So here with here -yeah, m2. Okay. So this one. Yeah, so it uses c3, and c3 is defining the -- oh, c3 is defining the state of core, of course. And we have here the set of core -- it's an abstract set -- defining all the cores that are -- all the cores that are here that are used in the hypervisor. And now we have m2. So m2 is defining something -- a new variable called core to guest. It's a partial injection, again, between core and guest. And in fact the CPU of the guest, because we suppose that the guest are a uni-processor. And we have again here a little modification of write which becomes fatter and fatter, and now this is host mem of SLAT of core to guest of c and a, and now the guest has disappeared from the write operation, from the parameter, and it is replaced by c and the connection is -- the g of the previous abstraction is core to guest of c. And, of course, this is done only if the guest is not sleeping, and then if the guest is not sleeping, then c is in a domain of core to guest. And now we go to m3. So m3 is doing the following. In this [inaudible] refinement -- we are going now to disconnect the guest from the SLAT. Before we had the direct connection between the guest and the SLAT, and now we're going to introduce this base address here. So we slightly enter into this and we connect now -- we connect now the guest -- each guest with a base of its SLAT. And also we introduce a base register in each CPU. It's a hardware register in each CPU, and when the scheduling occurs, what happens is that the base register, the base physical register of this CPU is assigned the address of the corresponding guest that is now going to be executed on that CPU. And that gives us a little more complication for these things. Host mem -- still host mem, of course. SLAT of SLAT register of c. And now we introduce for the first time the scheduling, and precisely in the scheduling, this is what I told you. The SLAT register becomes the guest SLAT address of g. So guest SLAT address of g is the address that is here. It is a table that is, again, defined at boot time for each guest. And this register -- now this is a physical register. This physical register is assigned at schedule time. So now we have connected these guests through the address of its SLAT, and it's put here in this register. Okay. So now we are ready to go here, and now what we're going to do is to implement -- the SLAT for now is just a big table that is here with this base address at the beginning, but that's not the way it is, because now we have these trees. And what we have to prove, and it is very important, because we say, and it's absolutely fundamental, that the SLAT was injective. So the SLAT was a connection between IPA and PA, so in PID and PA. And we have to prove now that the entire thing is in fact implementing an injective. So it's still a one-to-one. So here we enter into some things that are a little more -- and I have to introduce, of course, this sort of thing. And we want to do it still at an abstract level. We do not want to introduce the bits, but we just want to introduce this sort of thing. So this is defined, I think, in this -- no, this is defined in the next one. So this is defined here. So this is a structure of the SLAT as a three level -- at the three-level tree. And we are going now to decompose our address. So we have a notion of IPA and a notion of PA, and these are all page addresses, and now we define this, but rather than defining this as 32 bits, which has defined some projection function, IPA1, IPA2, IPA3, and we call them 10 bits, 10 bits, and 12 bits. Okay. We do not go into the real bits. There is no point in doing this. And, of course, we have to do some more axiomatization, and are the first one, which is quite important, is that this three projection -- IPA1, IPA2, IPA3 -- defines unambiguously an IPA. So if you have two IPAs and they have exactly the same projection, they are the same. And now I define the base -- the base is a bunch of -- I believe these are all the bases for all the guests, and now I define page, address, and 2, which corresponds to this, and page, address, and 3, which corresponds to this. And between the base and between the page address L2 and the page address L3, they form a partition of the page addresses. So the fact that they are partitioned means that the union of the three is exactly the page address, but they do not -they do not -- their intersection is empty. So you see what I'm constructing here little by little is precisely this notion that will make the basic theorem that this implementation of this SLAT is indeed an injective function. And now I define the word content for a page address for 10 bits and the byte content for the 12 bits, so this corresponds to this byte and to this word. And then there are -- I'm not going to enter too carefully. Here I have a number of technical theorems, but the final theorem which is here, the fundamental theorem, which is proved -- which has been proved is that the SLAT is injective. If we have SLAT bei1 equals similar bi2, so if the correspondence is the a lot of same then i1 is equal to i2. So I've proved by being very careful here with those injections that this entire thing here corresponds to a one-to-one correspondence between this and this. And so this is done in the context, and now we go into -- now we go into this machine, m4. So, by the way, we do not go immediately into this structure. We introduce a TLB, the famous TLB. Remember the [inaudible] memory. So now I'm going to say that we have sometimes this short circuit between a PID and this. So this is described here, and so in fact the right [inaudible] is divided into write 1 and write 2 [phonetic]. In write 1 the address is not in the domain of the TLB. Therefore, we have to work, we have to do painfully the work, but at the end of it when we find it, we update the TLB by evicting a certain member of the TLB. And write 2 is also a refinement of write, and write 2 has got this guard here that a is in the domain of the TLB. Therefore, we just use directly the TLB, and the scheduling is not different. And now we go into hypervisor, the last one, and the last one defines -- we are going now to split the event. This is very classical in Event-B refinement. We have an event which is something that is doing things in no the time, and now we are cutting it into pieces. And the event write 1, which was the one that was working through this structure is now going to be divided into write 10, write 11 and write 12 because this is not done atomically anymore. In the abstraction it was done atomically, but now we have three steps, and this is defined. And in between the steps -- the things are stored into supposedly some hardware [inaudible]. And this is what is done here. So we have write 10, which write syncs into some register. It goes from here to here by taking the upper part of the memory -- sorry, of the address, of the intermediate address, and then write 11 and write 12, and write 2, which was using the slot, is not modified. And now we go on to the last one, which is -- oh, no. Sorry. This was the last one. This was the last one. Okay. So this is the end of it. And so all the proofs have been done. Here is the statistics of -- oops -- that we have for this. And so you see it's not small. We had 176 proofs for proving all that. All the proof in the maintenance of invariant and all the theorems have been generated by a tool called a proof obligation generator, and -- but all automatic, 69 automatic, except nine of them where I had to do an interactive proof. So an interactive proof is just giving some clue to the automatic prover so that he can finish discharging. I have not tried to optimize things, so it might be better, but I've tried to do the things as it is. So this is finished for the interrupt -- for the memory. We have a certain status. It's not defined on one, but you see -- and what is important to see here in those events is that some events corresponding to the hardware, some events corresponding to what the hypervisor does, and actually this is not completely clear cut what is done by the future software of the hypervisor and what is done by the hardware of the hypervisor. By the way, the machine with the SLAT makes things far more convenient for the virtualization. So now we go into the interrupt ->> Rustan Leino: Should we skip to the conclusions or do you want to ->> Jean-Raymond Abrial: Yeah, we should go to the conclusion directly. If you are interested, these slides could be given to you. Here just let me give you the final stuff for the hypervisor -- for the interrupt. We have 205 [phonetic] total proof, and here we have a little more of proof done manually where 35 of them are done manually because in the last one there are 22 of them. So I have not time enough to do it, but we have written the requirement documents, so this is available, and those slides are available. Okay. So let's conclude now. >> Rustan Leino: All right. So in conclusion, what you've seen is the flavor of the way that the top-down development is done using a tool like Event-B. And some common themes are that if details are introduced gradually and the -- so you need to know only so many things. So for example, with the bits, you don't need to know exactly that they're stored in bits. You can work at a more abstract level, and that's very convenient. If you start from the code, you have all of the concrete details to look at right away, and then you'd like to try to abstract from there. And that seems to be much harder. So in what you've seen we also have the interrupt part which looks similar in flavor to the development of the memory manager. But there are many more things that are missing. So the model still is quite abstract, so we need to continue going down into more concrete levels so that we can eventually get to code that could actually be executed. You saw that in the refinement you get more and more events. So as you get more and more events, at some point the final thing is going to have some events being executed by hardware, some of them executed by hardware. And in the process, there have been many, many things that we have puzzled over where we have to sort of make up our own rules for -- I mean, what is hardware, what's software, and then we've consulted with experts and then we say, ah-hah, well, maybe we really should move this. This is really a hardware kind of thing. And, oh, there's the such-and-such piece of hardware like the TLB or whatever that we should consider in between. But the idea in the end is that when you look at each one of the events, you can see that this event is one of the software instructions that you can execute in the machine or this is a task, something that is performed by the hardware. And when we get to that point, then one can actually get the code generation is the idea for those software pieces. And, of course, many of those event at the end are going to be whatever the host OS is doing, and we don't -- for the most part we're independent of -- well, we want to be completely independent of what the host OS -- sorry, what the guest OS's are doing. In the end, also, when we would get to that point, we also have to go back to the requirements documents to check off those -- each one to make sure that we are really living up to the what that document says. You can think of the requirements document as and contract between the developers and users of a system, and we're trying to get those right. And there are some other pieces that you would want to have in a realistic hypervisor, like timers, for example, that are missing from our model. So from the standpoint of trying to develop this as a Gedanken experiment where, I mean, these refinements have been done before, but we're developing something for -- as something where we can see all the pieces fall -- come together and the tooling is there. We have expertise in our group in developing tools and automatic reasoning systems, so maybe there are some things that can be done there as well for making things in a process like this more automatic. So with that we'll open it up for questions. Thank you very much. [applause]. >>: What are the specifics of the instructions that [inaudible] come into picture? >> Rustan Leino: Right. So what would happen is that the -- there's some event that models the operations of the -- of each guest OS. So what we saw in the memory model here is only one instruction. That is, we're now looking at it as if each guest OS is just doing write instructions. But there would also be skip instructions; that is, things that the guest OS would do that are of no concern to. That is, they can do whatever they want. And what we need to show at some point is that if we then have real instructions of a piece of hardware, of a processor, we need to show that those instructions are correct refinements of skip [phonetic]. In other words, they don't interfere with the operation of the hypervisor. >>: Right. But if a CPU has [inaudible] where does that come into ->> Rustan Leino: So imagine a -- >>: [inaudible] >> Rustan Leino: So take an operation ->>: [inaudible] so this model they just know that there's a SLAT? They don't know who does it? >> Rustan Leino: So think of it this way. In that skip event that needs to be refined by instructions, that would be done by -- executed by each guest. If the guest attempted an operation that can only be done in the -- by the hypervisor, that is, you have to put the processor in the hypervisor mode to be availability execute, then what we would need to do is simulate that -- that is when we're doing that refinement, we need to model the execution -- the hardware execution of that instruction by giving a trap, doing something that prevents the guest from mucking, for example, the SLAT or mucking directly with the physical addresses. A guest would only be allowed to muck with the intermediate physical addresses. >>: [inaudible] >> Rustan Leino: Wait. The n that you saw in the ->> Jean-Raymond Abrial: That is used for the addresses. >> Rustan Leino: Right. Uh-huh. >>: [inaudible]. >>: [inaudible] >> Jean-Raymond Abrial: Yeah, that would be some refinement. >>: So if you used natural numbers and refined from that [inaudible] or something, you had a domain of natural numbers ->> Rustan Leino: So the way to ->>: It wouldn't correspond to the [inaudible]. >> Rustan Leino: So if you have an abstract set that lives in the context, then what you're doing is you're doing a development, a refinement, that is parametric in what those are. >>: [inaudible]. >> Jean-Raymond Abrial: No, it would not be the natural number, it would be a finite portion. Yeah, yeah. Sure, sure, sure. >>: Do you have any feedback from the hypervisor project or the [inaudible] verification of it that -- like the kinds of things that you were doing here, are these where errors were found in the hypervisor or would such a development have prevented ->> Rustan Leino: We don't have such -- right. You can check for that. Maybe [inaudible] knows more. But, I mean, the feedback that -- I mean, we have consulted with two of people who were involved in that verification, that is, in verifying the Hyper V, to get a sense of what sort of models should we have, what kind of hardware can we expect, and things like that. But, yeah, I don't know. You'd have to check with them to see what sort of errors, whatever, they discussed. But one thing that I should have said is that the reason that the hypervisor seems like such a good thing to do an experiment like this with is that many pieces of software -- dancing clowns on web pages, for example, I mean, we don't really care if they're really correct or not. If they -- but here the hypervisor is something that you really care that the machines are independent. And, furthermore, if you try to debug it with standard techniques, you're operating at such a low level that it's very difficult to try to understand that that level -- I mean, if you have something that goes wrong. So, therefore, trying to use the technique that is designed to start from abstractions and get down to correct things seems worthwhile here. And you notice that after these five weeks now, we don't have any code. I mean, it's not there yet. Whereas, if you were to do it the traditional way, you probably would have written code certainly by now. >>: [inaudible] >> Rustan Leino: Jason, did you also have a question? >>: Yeah. So I was wondering where the virtual [inaudible] translation sits. Is that before or after? >> Rustan Leino: You mean from the -- so you're thinking each guest operating system probably has a bunch of processes running, and when ->>: [inaudible] so that sits before ->> Rustan Leino: Right. So that sits before -- all of the addresses that we -- we being the hypervisor -- would get is already a request for a particular intermediate physical address. So what each guest operating system presumably would do is it would have virtual addresses that it gives to each of its processes, and it maps them in fact using page tables and TLBs, very similar to what we have here, but maps that into an intermediate physical address. >>: Yes. But my other question is that's usually done directly by the hardware [inaudible] so you're able to track is after the hardware ->> Rustan Leino: Right. All of that happens in the guest mode. And then that whole process in the end comes up with one IPA, one intermediate in this address, and that's what we get. So we don't actually care -- a guest OS could decide to not use that virtualization mechanism or come up with its IPAs in any kind of way. But, of course, the most likely would be that it would do it in just that way. >>: So as a second ->>: [inaudible]. >>: So second level. And then there's -- I can't imagine that that would be fast at all unless there was some specialized hardware also doing that process >> Rustan Leino: You're right. I mean, we would assume hardware like that. And actually we've gone through and thought much about also how that is. But for our modeling it plays no role. Well, actually, that's not right. In the end it will play a role, because when we would have to take something like a skip action or something that is done by each guest, we need to be able to refine it into what the hardware does for those sorts of things. But as far as the -- I mean, all of these things, the translation from IPAs to PAs, we don't care how the guest comes up with its IPAs. >> Jean-Raymond Abrial: [inaudible] our two TLB, the TLB for the guest and the TLB for the hypervisor. So the TLB for the guest, we don't care. This is the business of the guest. And at the end of the -- and the TLB of the guest maps a virtual address of the guest to an IPA, and then that IPA -- and then the guest wants to write directly to this IPA, and this is this part that is taught by the hypervisor. >>: This is developed specifically for hardware that has the ability to trap the second level? >> Rustan Leino: We're assuming that there's such a SLAT. >>: [inaudible] >> Rustan Leino: You could refine ->>: You have to believe that you've got the chance to get in there ->>: [inaudible]. >>: That's what I'm saying is you have to know that you can trap it after. Otherwise you would have to simulate errors [inaudible] ->>: [inaudible] >> Rustan Leino: Right. So and we also -- by the way, we didn't look at the details in at that talk about the interrupts, but the interrupts see very similar things that the -- because of the virtualization that's going on -- that the hypervisor gets the physical interrupts, and it will then do things to mark them as being active, that is, that they are being handled, and then forwards them on to the virtualization hardware for the -- I mean, that goes to each guest. And at that point what the guest does -- I mean, if the guest will take that interrupt or what it will do in its interrupt processing routine, whatever, we don't care. We know a few things. For example, like there are a bunch of priorities that guests can set. And when you set those priorities, then you can get a stack of interrupts. We're modeling that. And then the guests are supposed to then peel off these things in -- I mean, pop them in the opposite order from the way that they were pushed. So we're guessing that the hardware would trap any violations to such a thing. But, again, most of that is just independent of what the hypervisor does. The hypervisor just tries to simulate the physical pieces of hardware in the virtualized way to each guest. By the way, I should have said here the initialization, the one thing that we have not looked at is how do you initialize all of these tables and the interrupt maps and so forth. And there our understanding is that there's something in the bias or something that would tell the hypervisor how to set things up, but we've not looked at that. That needs to be done. That's a missing piece. >> Jean-Raymond Abrial: One thing that is very important in the separation of future software events and hardware events, the software event will give rise to code, but the hardware events are important, too, because we have to check now that the physical hardware corresponds to the model we've done for it. And so it's a way of -- and we have to do things. We have to dig into the documentation or to dig into the real physical hardware to see whether our events -- because our software will be correct with regard to those physical events corresponding to the hardware. Now, if the hardware is doing something different, then we have some problem, of course. So those events are also important. And I think even more -- going more further, it could be used also by the hardware people. They could formalize the future hardware and then implement it by -- in the circuit. >> Rustan Leino: All right. Anything else? All right. Thanks very much. >> Jean-Raymond Abrial: Thank you. [applause]